background image

Next-Gen VoIP
Services and
Applications Using
SIP and Java

This Guide has been sponsored by

Visit techguide.com

The 

Technology 

Guide

Series

This Guide has been sponsored by

background image

Don’t let our sexy curves and cool colors fool you.

The internet-age Pingtel xpressa

phone, and its virtually 

limitless Java

repertoire of revenue-enhancing possibilities, such

as hosted IP voice services, is a very serious money maker indeed.

To learn about the opportunities the world’s most intelligent phone

can bring you, go to www.pingtel.com/mintmoney.

Or send an e-mail to us at hostedvoiceservices@pingtel.com 

and we’ll get back to you

For Service Providers,  

it’s a 

mini branch

of the U.S. Mint.

background image

TECHNOLOGY GUIDE

2

Table of Contents

Abstract

4

Introduction

4

Architecture Models

6

Technology Enablers for Next Generation 
Voice Services and Applications

16

Next Generation IP Voice Services 
and Applications

29

Summary

33

Glossary

34

Appendix A: Session Initiation Protocol 
(SIP) Concepts and Operation

38

Editorial Writing Team

ATG’s Technology Guides and White Papers are produced according to a
structured methodology and proven process. Our editorial writing team
has years of experience in IT and communications technologies, and is
highly conversant in today’s emerging technologies.

The Guide format and main text of this Guide are the property of The
Applied Technologies Group, Inc. and is made available upon these terms
and conditions. The Applied Technologies Group reserves all rights herein.
Reproduction in whole or in part of the main text is only permitted with
the written consent of The Applied Technologies Group. The main text
shall be treated at all times as a proprietary document for internal use
only. The main text may not be duplicated in any way, except in the form
of brief excerpts or quotations for the purpose of review. In addition, the
information contained herein may not be duplicated in other books,
databases or any other medium. Making copies of this Guide, or any
portion for any purpose other than your own, is a violation of United
States Copyright Laws. The information contained in this Guide is
believed to be reliable but cannot be guaranteed to be complete or
correct. Any case studies or glossaries contained in this Guide or any
Guide are excluded from this copyright.

Copyright © 2001 by The Applied Technologies Group, Inc. 
209 West Central Street, Suite 301, Natick, MA 01760
Tel: (508) 651-1155, Fax: (508) 651-1171  
E-mail: info@techguide.com, Web site: http://www.techguide.com

techguide.com

Visit our Web site 

to read, download, 

and print all the 

Technology Guides 

in this series.

Visit our Web site 

to read, download, 

and print all the 

Technology Guides 

in this series.

techguide.com

Software Applications
Network Management
Enterprise Solutions
Network Technology

Telecommunications
Convergence/CTI
Internet
Security

Over 100 Technology Guides in the 

Following Categories:

Over 100 Technology Guides in the 

Following Categories:

background image

caller ID, etc.), cannot provide the types of features
that are needed by a contemporary business in the
age of e-commerce. The traditional business
telephony solutions are complicated, for both the
service administrators and the users. Because of the
daunting complexity of PBX and CLASS/Centrex
user-interfaces, users typically know and use only a
fraction of the total feature set. 

Now imagine telephony services in the context

of the current business need. The users would still
like to use a phone for making and receiving calls
and playing voice-mail messages. However, they
would also like to have the phone appliance
integrated with a browser-based PC for managing
phone books and seamlessly interfacing with other
applications, such as customer relationship
management (CRM), sales force automation (SFA),
supply chain management (SCM), time accounting,
etc. In other words, perform tasks most suitable
for the PC on the PC and those most suitable for
the telephone using a phone appliance and have
the two devices seamlessly integrated.

Today’s telephone just cannot deal with this new

business imperative. 

In contrast, the Internet and Web-based

communications have revolutionized the business
environment and user personal life-styles by their
inexpensive, standards-based innovations. We already
have data, multimedia, video, and music applications
on the Internet. The Internet is already serving as the
underpinning of critical business and IT solutions.
Just in the last few years alone the Internet and the
Web have generated more innovations than
traditional telephony has produced in its entire
history. The next frontier for the Web is to apply the
same degree of innovation to telephony.

Most market surveys have verified that IP

telephony is already supplementing traditional
telephony and it is expected that the IP telephony
architecture will ultimately replace the traditional
telephony model.

Abstract

This Technology Guide explains the unique

benefits of using the Web architectural model with
SIP and Java as the enabling technologies for next
generation IP voice services and applications. 

Using the Web as a reference model for rapid

innovation, the Guide contrasts the limitations of
circuit-switched telephony and first generation
VoIP architectures with the Web model. It
summarizes limitations of centralized-processing
models such as traditional telephony, MGCP, and
Megaco as compared to peer-to-peer models such as
SIP and H.323.

This Technology Guide explains in more detail

the unique benefits of using SIP for call control
and Java for making phones intelligent. SIP is
compared with H.323 in terms of innovation,
scalability, simplicity, ease of deployment, and
standardization. The guide also includes an
explanation of SIP concepts and operation. A
description of Java features supporting new voice-
services and applications is also included. 

The Guide concludes with examples of new

voice-services and applications made possible
exclusively by SIP and Java. 

Introduction

Traditional telephony has hit a wall in terms of

innovation, ease of use, and cost reduction. The
core components of traditional telephony — 
the terminal (telephone), PBX, the central office
switch, and the switching network — are struggling
and failing to keep up with the rate of innovations
on the Internet. The archaic telephony framework
with PBXs and Custom Local Area Signaling Services
(CLASS) switches providing Centrex and enhanced
residential services (call waiting, call forwarding,

Next-Gen VoIP Services and Applications Using SIP and Java

5

4

TECHNOLOGY GUIDE

background image

Both models have all of their intelligence in a

centralized switch or server, which performs all of
the telephony functions such as call setup, call
forwarding, conference calling, etc. All requests,
responses, and state changes must be processed
by the central switch/server with the end-station
being a dumb terminal. 

The following are the salient characteristics of

the traditional telephony environment:

• Archaic, Host-to-Dumb Terminal Architecture:

Voice service architecture has not changed for
generations. Today, PBX and Centrex services
are delivered using switches that contain all
application intelligence — just as mainframes
and minicomputers did for IBM 3270 or VT100
terminals in old computer systems.

• Dumb Terminal — The Telephone: Voice

service delivery assumes a dumb terminal in
telephony parlance — the telephone. The end-

Figure 1B: First-generation 

IP telephony architectures

"call manager"

IP Centrex

Softswitch

"gatekeeper"

LAN PBX

Next-Gen VoIP Services and Applications Using SIP and Java

7

This Technology Guide explains the architecture

of the new IP telephony model using Session
Initiation Protocol (SIP) and Java. The Guide also
demonstrates the power of SIP and Java in terms
of scalability, ease of use, and innovative services
and applications. 

Architecture Models

Circuit-Switched and First-Generation IP
Telephony Architectures

The traditional telephony architecture is based

on a centralized processing model. First generation
IP telephony architecture uses a Media Gateway
Control Protocol (MGCP), Megaco, or vendor
proprietary protocols such as Cisco’s Skinny Client
Control Protocol (SCCP), which also are centralized
architectures similar to the traditional telephony. 

Figure 1A: Traditional circuit-switched

telephony architectures

Centrex

CLASS 5

switch

PBX

6

TECHNOLOGY GUIDE

background image

Web Architecture

The Web represents the most successful

application architecture in history. The Web
features many intelligent servers located
everywhere on the network and an intelligent,
browser-based client device (a PC or a low cost
Internet appliance). It is the client device, not the
server, that both initiates and controls all
communications with the server. When a user
simply clicks on an icon to access an application,
the browser pulls content in the form of HTML
and applications (Java, Java script, Flash, Active X,
etc.) from the server and runs them on the PC.

There is a complete disaggregation of services in

the Web model. Not only do the services come
from different servers, they may be provided by
different and multiple service providers. Some of
the examples (shown in figure 2) include Yahoo
for news; Amazon for shopping; MSN for instant
messaging; ASP services (such as Corio) for
customer relationship management (CRM), sales
force automation (SFA), enterprise resource
planning (ERP); and MP3.com for music. An
enterprise can outsource as few or as many
services as suits its business model.

Key characteristics of the Web architecture

include:

• Intelligent end devices (clients)

• Distributed, intelligent servers (no central

switch or server for services)

• An open architecture leading to innovation,

rapid application development, and lower costs

Next-Gen VoIP Services and Applications Using SIP and Java

9

user interface for these services on the dumb
telephone requires non-intuitive flash
sequences and star codes. No options exist for
making telephony features easier to use and
increasing user productivity. 

• Hardware Specific Software: The voice features

reside in software that is usually hardware-
specific and/or proprietary. This environment
requires highly-specialized software engineers
that are expensive and hard to find. Even
simple software modifications require the
extensive regression testing of feature
interaction.

• Limited Next-Generation Platforms: Next-

generation voice service platforms still fall short
of business needs. Most first-generation IP
telephony systems, for both service providers
and enterprises, do exploit IP for transport and
some feature a Java or XML software
environment. However, this “open”
environment is not easily made extensible by
anyone other than the vendor or possibly a
service provider; certainly not the enterprise or
an independent software vender with a great
idea. These systems, consequently, still
perpetuate the same 1960’s host-terminal
architecture with a dumb telephone as the
endpoint:

• The IP PBX is a host computer with all the

smarts driving dumb IP phones. 

• VoIP gateways, softswitches, and their

feature servers are merely physically
distributed mainframes talking to dumb
terminals.

8

TECHNOLOGY GUIDE

background image

voice-world are solely defined and developed by
PBX and CLASS switch manufacturers, just as
mainframe applications were defined by the
vendors. 

The PBX and CLASS switch vendors, their ideas,

their bureaucratic practices, and their business
motivations have held innovation in the voice-
world hostage. Voice features reside in software on
the switch that is hardware-specific and vendor-
specific. It is a proprietary environment that is not
openly extensible. Even modest new functions
require the onerous regression testing of feature
interaction.

The centralized, closed-software environment

offers no way for enterprises to add their own
innovations or enhancements to telephony
features, let alone individual users or software
developers with really good ideas. Some features
are impossible to implement because of the dumb-
telephone as the endpoint. Consequently,
innovation is and will remain dead, especially
when compared to the revolutions on the Web.

Web

Innovation on the Web occurs at the edges of

the network, where anyone — businesses and
individuals can create Web sites that are
immediately open for other users to interact with.
On the Web, in contrast to traditional telephony, a
new page or “feature” can be created in a few
minutes. More importantly, the Web page can be
conceived, created, delivered and personalized by
anyone — yahoo, e-bay, GE, a company, an
individual, their kids or their grandparents. Several
million Web sites are in existence today, up from a
few thousand in 1993. These sites satisfy
everyone’s personal and business needs for news,
buying, entertainment, chat, sports, sex, etc.
regardless of gender, race, religion, ethnic
background, industry and occupation.

Amazon.com would not have happened if the

world needed to rely on the data communications

Next-Gen VoIP Services and Applications Using SIP and Java

11

Comparing the Architectures 

The Web has revolutionized the world of

business. It has enabled a whole new business
paradigm in the form of e-business, portals, 
e-tailers, and collaborative applications. The Web
has enabled businesses to reach business partners
and customers worldwide with a click of the
mouse. Telephony services must change
dramatically to become a functional member of
this business revolution. However, given their
limitations, it is virtually impossible for the current
telephony architectures to satisfy emerging
requirements.

Innovation

Traditional and First-Generation IP Telephony

The telephone was invented more than 125

years ago. Since then it has enabled people to talk
and do only a handful of other things, like use
voice mail. All of the features and services in the

Figure 2: Web application architecture

Intelligent servers

Intelligent clients

CRM/SFA

MP3.com

doubleclick.com

Virtualcart.com

MSN Instant

Messenger

amazon.com

yahoo.com

MP3

Java

Flash

Active X

HTML

Cookies

10

TECHNOLOGY GUIDE

background image

browser’s graphical user interface means that users
do not have to memorize features as in the world
of telephony. The use of any Web site is an
intuitive discovery process, performed simply by
pointing and clicking at images and words.

Scalability and Capacity

Traditional and First Generation IP Telephony

In the telephony world, big centralized boxes

have all the smarts. Whenever the telephone, the
“terminal” in the parlance of telephone equipment
vendors, sends a flash sequence or * code, it’s the
PBX or CLASS switch that figures out what it
means. The PBX or the switch also must actively
manage each and every call. Consequently, it just
does not scale. Support for just one more user
may end-up requiring a hugely expensive
replacement or addition. 

Web

A Web site, however, can support millions of

users. Scalability is achieved not only through the
connection-less nature of IP and by adding more
and bigger servers to the Web site. Scalability is
also achieved by exploiting an intelligent endpoint
— the browser-based PC. In fact, it’s the browser
software that interprets Web objects and puts a
Web page together. 

For example, in accessing a typical e-commerce

site, it’s the browser, not a server, that:

• Retrieves and displays the source HTML page

and embedded product images individually

• Retrieves and runs a Java applet, Java script,

Flash, Active X or other application
components

• Retrieves and displays a dynamic advertisement

from DoubleClick.com

• Retrieves shopping cart services from a

ShoppingCart.com

Next-Gen VoIP Services and Applications Using SIP and Java

13

vendors such as Alcatel, Cisco, Lucent, or Nortel to
invent the “service” and add the features to a
router or a switch. 

Ease of Use

Traditional and First-Generation IP Telephony

For most telephone users, cryptic impossible-to-

remember flash sequences and * codes are the
interface to thousands of PBX and CLASS features.
For the fortunate few with block character
displays, even IBM 3270 and VT100 terminals
appear attractive.

Users don’t know what voice features exist and

if they do, they do not know how to use them.
While most voice service platforms such as PBX
and CLASS switches offer hundreds or thousands
of features (300-400 features in a typical PBX,
3000-4000 in a CLASS 5 switch), most users
typically don’t know any more than just a few —
transfer, hold, last number redial. In research
conducted by WorldCom, 9 out of 10 executives
could not even transfer a call without resorting to
the “help scream” — “Do I dial ‘flash’ first and
then the number, or the other way around?” Trying
to set-up just a 3-party conference call over a PBX
is even a bigger nightmare. It’s no wonder that the
assisted conference calling businesses of AT&T,
Sprint and WorldCom are so big and profitable.
For many, the most difficult part of changing jobs
is learning a new phone system. “What do I dial
to get an outside line?” Consequently, for the vast
majority, ignorance is bliss, yet very expensive in
user productivity.

Web

On the Web, millions of sites with billions,

perhaps trillions, of pages can be easily navigated
by pointing and clicking at pictures or words
displayed on an intelligent, browser-based PC.

In contrast to telephony feature usage, anyone

from kids to their great grandparents can easily
discover and use any site on the Web. The

12

TECHNOLOGY GUIDE

background image

An enterprise has the option of providing PBX

services locally through a premises-based system
device or these could be outsourced to a network-
based service. The outsourced service not only
eliminates capital costs but may actually provide
richer services than those available from a PBX.

The figure also shows some illustrative services

such as unified messaging, presence messaging,
instant messaging, and CRM integration, all of which
can be provided by separate service providers
offering best-of-breed solutions for an enterprise’s or
even an individual user’s specific requirements.

PCs and other phones are simply resources on

the network that provide services to users. In this
model, the PC may provide services for the phone
such as integration with the desktop applications
or the phone may provide services for the PC such
as causing the phone to ring and automating
conference calls in Microsoft Outlook.

Figure 3: Web architecture for next-generation

voice services and applications

Intelligent servers

Intelligent clients

Audio

Auctions

IP PBX

PSTN

gateways

CRM/SFA

Presence & IM

Unified Messaging

Phone-to-phone

data & app exchange

Java

HTML

MP3

Hosted

PBX

service

PC app

integration

Next-Gen VoIP Services and Applications Using SIP and Java

15

• Stores cookies to identify users and maintain

states 

• Encrypts credit card numbers

Manageability

Traditional and First-Generation IP Telephony

An expert — the equivalent of the proverbial

rocket scientist — must perform all maintenance
and management tasks for the PBX or the switch.
Tools for managing moves/adds/changes tend to
be horrendous and, consequently, administrators
learn only the basic coping skills. This makes it
extremely costly to administer the switch.
According to some estimates it can cost as much
as $300-$500 per PBX move/add/change. For a
Centrex line, it can take weeks for a change to be
implemented by the telephone company.

Web

Self-service by users is the normal operative

model here — for registration, buying things,
personalizing info, etc.

Every office device including printers, copiers,

and now intelligent IP phones have a built-in Web
server that enables remote configuration over the
net via browser interface.

Every office device and home appliance is

becoming more intelligent and capable of running
automated diagnostics, reporting the findings, and
ordering replacements before service is disrupted.

Exploiting the Web Architecture for Next
Generation Voice Services and Applications

Figure 3 shows what telephony would look like

if migrated to a Web-like architecture. In this
model, services and applications are resources on
the network and are accessed and controlled by
the phone and not by a central-switch or a
gatekeeper. Nor does a central-switch or
gatekeeper control what the phone can do.

14

TECHNOLOGY GUIDE

background image

Phone Intelligence Technology

An ability to support small footprint applications

is the key for incorporating intelligence in phones.
A powerful yet easy to use programming language
used widely for Web-enabling Internet appliances
is required. In addition to rich functionality for
traditional Web applications, features developed
specifically for telephony and security are
mandatory. Lastly, the language must already be
used by hundreds of thousands of programmers
worldwide in order for innovation to happen
rapidly. 

Extensible, Scalable Call Control Protocol

A call control protocol is used for call related

functions such as setting up, monitoring, and
terminating calls. However, in the new IP
telephony model, the call control protocol must
differ from traditional telephony and the first
generation IP telephony protocols. For maximum
scalability, the new call control protocol must
support peer-to-peer communications whereby
two or more phones can set up and communicate
directly without requiring anything more than
locations services from a call control server. In
addition, the protocol must allow the peer-to-peer
exchange of applications and data in addition to
voice communications.

The call control protocol must support a wide

range of environments — from home-office to the
largest enterprise and from the smallest to the
largest services provider. Thus, the protocol must
be highly scalable as well as cost effective in a
diverse range of configurations. Since it is not
possible to predict all future applications of IP
telephony, the protocol must also be extensible in
order to accommodate unforeseen requirements.

Next-Gen VoIP Services and Applications Using SIP and Java

17

Technology Enablers for
Next Generation Voice
Services and Applications

Clearly, while the model in figure 3 is quite

pedestrian in the Web world, it is quite
revolutionary in the context of traditional
telephony. The components needed to implement
this model for telephony are as follows:

Intelligent Servers

These are distributed resources that interact with

intelligent clients (PCs and phones). In terms of
hardware and software, these servers are standard
Unix, Linux, and Microsoft Windows platforms.
Compared to traditional PBXs, these servers offer
choices of multiple vendors and competitive
pricing with an open applications development
environment.

Intelligent Phones

These phones should provide much more than

incoming call ringing. In order to maintain their
independence from a central switch, they must
also provide local capabilities such as call hold,
transfer, forwarding, redial, caller ID, multi-party
conferencing, and many other traditional
telephony features. 

The intelligent phones should be thin-client

computing devices that can interoperate with PCs
and servers on the network. These devices must
support dynamic loading and management of
applications such as Java applets. For ease of use,
they should incorporate functions such as
graphical and audio helpers to ease the use of
traditional and next generation applications.

16

TECHNOLOGY GUIDE

background image

H.323, the older of the protocols, was originally

designed for video conferencing over the LAN.
Since then it has been morphed and used to
support voice and video over then WAN as well.
SIP, however, was designed from the beginning for
multimedia sessions and conferences over the
WAN. Because of these differences in their design
objectives, SIP offers numerous compelling
advantages in the areas of extensibility, scalability,
and ease of deployment over H.323. 

Today there are more products available

supporting H.323 than SIP. However, since its
introduction, SIP is rapidly becoming the preferred
protocol. A January 2001 survey of Voice over IP
vendors in Network World found that while 75%
of the vendors offered products based on one of
the four H.323 versions, an approximately equal
number of them were already planning to offer
SIP-based products by June 2001. However, the
more telling statistic was that less than 25% of the
vendors were planning to upgrade their products
from H.323 Version 2 to Version 3 and even fewer
to Version 4, the latest version of H.323. According
to the same survey, most vendors expected H.323
to become a legacy protocol. In contrast, the list
of vendors supporting or planning to support SIP
is growing rapidly. Service providers embracing
SIP include WorldCom, Level 3, Net2Phone, Telia,
Webley, Ibasis, LipStream, and TalkingNets as of
March 2001 with many more anticipated.

The reasons for the rapid ascendancy of SIP

become obvious when we compare it with H.323
in the areas of innovation, scalability, ease of
deployment, manageability, and the standardization
process. Appendix A provides additional details on
SIP concepts, definitions, and operation.

Next-Gen VoIP Services and Applications Using SIP and Java

19

SIP (Session Initiation Protocol) — The
Call Control Protocol 

SIP introduces the benefits of the Web

architecture to IP telephony. It provides a
powerful, extensible, scalable, and easy-to-deploy
protocol for call control and media exchange. 

Several standards are available for building IP

telephony solutions. These include the Session
Initiation Protocol (SIP) from the IETF; ITU-T
H.323, an ITU-T umbrella standard; Media
Gateway Control Protocol (MGCP) from IETF;
Media Gateway Control (Megaco), a joint protocol
by IETF and ITU-T; and proprietary protocols such
as Cisco’s Skinny Client Control Protocol (SCCP). 
A high-level comparison of these protocols is
included in table 1.

Table1: IP Telephony standards

SIP

H.323

MGCP

MEGACO

PRO-
PRIETARY

Architectural Peer-to-peer

Peer-to-peer

Master/

Master/

Master/

Model

slave

slave

slave

Media types

Voice, video, 

Voice, video,

Voice

Voice, 

Voice

data

limited data

video

Network 

Intra, Extra, 

Intra, Extra, 

Intranet 

Intranet 

Intranet

scope

and Internet

and Internet

only

only

only

Extensibility

High

Low

Medium

Medium

Low

Scalability

High

Medium

Low

Low

Low

Ease of 

High

Low

Medium

Medium

Medium

deployment

Standardization

IETF

ITU-T

IETF

IETF and 

None

ITU-T

Why SIP

Of the protocols listed in table 1, only SIP and

H.323 are peer-to-peer protocols. MGCP, Megaco
and Cisco’s proprietary SCCP represent the old
centralized model and suffer from this model’s
limitations discussed earlier. Thus, the real choice
for a protocol with Web-like benefits comes down
to one of the peer-to-peer protocols — H.323 or
SIP. 

18

TECHNOLOGY GUIDE

background image

protocols within H.323. These include Registration,
Admission and Status (RAS), Q.931 for call control,
and H.245 for transmission of non-telephony
signals on the line. As shown in the tables, SIP has
a total of 5 methods (commands) and 8 responses
and H.323 has 21 commands/messages across the
three protocols. SIP can be implemented as a
stateless protocol and does not need to maintain
any call states, which further increases scalability
of SIP. SIP also shows a substantially higher
efficiency than H.323 during call set-up by using
approximately 50% fewer messages. Figures 4 and
5 show call set-up messages for H.323 and SIP,
respectively. While H.323 requires a total 13
message exchanges, SIP requires only 7
exchanges.

SIP Methods and Response Codes

Table 2: SIP methods

SIP METHODS

INVITE

User or service is being invited to participate in a session.

ACK

Client has received a final response to an INVITE request.

OPTIONS

Server being queried about capabilities.

BYE

User agent client indicates to server to release the call.

CANCEL

Cancels a pending request.

REGISTER

Client registers address with a SIP server.

Table 3: SIP response codes

SIP RESPONSE CODES

1xx

Informational: Request received, continuing to process request.

2xx

Success: Action successfully received, understood and accepted.

3xx

Redirection: Further action required to complete request.

4xx

Client Error: Request contains bad syntax or cannot be executed 
at server.

5xx

Server Error: Server failed to execute an apparently valid request.

6xx

Global Failure: Request cannot be executed at any server.

Next-Gen VoIP Services and Applications Using SIP and Java

21

Innovation

SIP enables new services and applications not

possible with H.323 (or other IP telephony
protocols) and easily empowers service providers,
application developers, and enterprises to create
unique, differentiated services and applications.
For example, SIP uses a simple text-based
encapsulation (based on the Internet standard
MIME) which enables it to transmit data and
application programs with the voice call, making it
easy to send business cards, photos, and/or MP3
encoded information during a call. 

SIP also supports third-party call control through

simple applications to modify SIP messages and
enable functions such as sending office calls to a
home phone after 5:00 PM or forwarding video
calls to a PC. Lastly, SIP envisions the need to
accommodate extensions — new protocol headers,
methods, bodies and parameters, to implement
new and innovative applications. By design not all
products are required to support these extensions
(just the endpoints) servers or phones that want to
use them.

Scalability

Being peer-to-peer protocols, both SIP and

H.323 eliminate the need for central servers to
control everything. Peer-to-peer protocols reduce
costs of network and server infrastructure
equipment necessary to support a user population
of a given size. 

Within peer-to-peer protocols, SIP is a much

more efficient and less complex protocol,
therefore, more scalable than H.323. H.323 is
actually an umbrella specification that includes
several protocols from other ITU-T standards.
Tables 2 – 4 cover three categories of such

20

TECHNOLOGY GUIDE

background image

Table 6: H323/H.248 commands and responses

H.248

Command/Message

Function

Master-Slave Determination

Determines which terminal is the master and
which is the slave. Possible replies:
Acknowledge, Reject, Release (in case of a
time out). 

Terminal Capability Set

Contains information about a terminal’s
capability to transmit and receive multimedia
streams. Possible replies: Acknowledge,
Reject, Release.

Open Logical Channel

Opens a logical channel for transport of
audiovisual and data information. Possible
replies: Acknowledge, Reject, Confirm.

Close Logical Channel

Closes a logical channel between two
endpoints. Possible replies: Acknowledge.

Request Mode

Used by a receive terminal to request
particular modes of transmission from a
transit terminal. General mode types include
VideoMode, AudioMode, DataMode, and
Encryption Mode. Possible replies:
Acknowledge, Reject, Release. 

Send Terminal Capability Set

Commands the far-end terminal to indicate its
transmit and receive capabilities by sending
one or more Terminal Capability Sets.

End Session Command

Indicates the end of the H.245 session. After
transmission, the terminal will not send any
more H.245 messages.

Ease of Deployment

Deploying and supporting SIP is similar to

HTTP. It uses standard protocols and functions,
which already exist in the current IP networks and
are well understood by system administrators and
technical support personnel. SIP has the following
HTTP characteristics:

• Standard Internet addressing: SIP uses

standard IP addressing format for both names
and addresses, e.g., sip:username@abcorp.com
or sip:1.781.938.5306@abcorp.com

• Clear text protocol: SIP uses clear text for its

protocol encapsulation unlike H.323, which
uses binary encoding, making SIP easier to
diagnose and troubleshoot.

Next-Gen VoIP Services and Applications Using SIP and Java

23

H.323 Commands/Messages 

Table 4: H.323 RAS commands and responses

RAS

Command/Message

Function

RegistrationRequest (RRQ)

Request from a terminal or gateway to register
with a gatekeeper. Gatekeeper either confirms
or rejects (RCF or RRJ)

AdmissionRequest (ARQ)

Request for access to packet network from
terminal to gatekeeper. Gatekeeper either
confirms or rejects (ACF or ARJ)

BandwidthRequest (BRQ)

Request for changed bandwidth allocation,
from terminal to gatekeeper. Gatekeeper either
confirms or rejects (BCF or BRJ)

DisengageRequest (DRQ)

If sent from endpoint to gatekeeper, DRQ
informs gatekeeper that endpoint is being
dropped; if sent from gatekeeper to endpoint,
DRQ call to be dropped. Gatekeeper either
confirms or rejects (DCF or DRJ). If DRQ sent
by gatekeeper, endpoint must reply with DCF.

InfoRequest(IRQ)

Request for status information from
gatekeeper to terminal.

InfoRequestResponse (IRR)

Response to IRQ. May be sent unsolicited by
terminal to gatekeeper at predetermined intervals.

RAS Timers and Request 

Recommended default timeout values for

in Progress (RIP)

response to RAS messages and subsequent
retry counts if response is not received.

Table 5: H.323/Q.931 commands and responses

Q.931

Command/Message

Function

Altering

Called user has been alerted —”phone is ringing”.
Sent by called user.

Call Proceeding

Requested call establishment has been initiated and
no more call establishment information will be
accepted. Sent by called user.

Connect

Acceptance of call by called entity. Sent from called
entity to calling entity.

Setup

Indicates a calling H.323 entity’s desire to set up a
connection to the called entity.

Release Complete

Indicates release of call if H.225.0 (0.931) call
signaling channel is open. Afterwards, call reference
value can be reused. Sent by a terminal

Status

Responds to an unknown call signaling message or
to a Status Inquiry message. Provides call state
information.

Status Inquiry

Requests call status. Can be sent by endpoint or
gatekeeper to another endpoint.

22

TECHNOLOGY GUIDE

background image

Standardization

The ITU-T, organized under the auspices of the

United Nations, defines traditional telephony and
H.323 standards. It is a slow moving body with a
highly political process. Participation in ITU-T
activities is limited to paid members. Most of 

Figure 5: H.323 Call set-up sequence

Endpoint 1

Gatekeeper

Endpoint 2

Admission

Request

Admission

Confirm

Setup

Call Proceeding

Admission Request

Admission Confirm

Altering  Connecting

Terminal Capability Set

Master/Slave Determination

Terminal Capability Set + Ack

Master/Slave Determination + Ack

Terminal Capability Set Ack

Master/Slave Determination Ack

Open Logical Channel + Ack

Open Logical Channel

Open Logical Channel Ack

Media (RTP)

Close Logical Channel

End Session Command

Close Logical Channel + Ack

End Session Command

Release Complete

Disengage Request

Disengage Confirm

Disengage Request

Disengage Confirm

Endpoint 1

Gatekeeper

Endpoint 2

1

2

3

4

5

6

7

8

9

10

11

12

13

RAS

0.931

H.245

Next-Gen VoIP Services and Applications Using SIP and Java

25

• Simple error messages: SIP uses familiar error-

messages with prefixes such as 10x, 20x, etc.

• Leverages other Internet protocols: SIP uses

other familiar Internet protocols such as MIME
and Session Description Protocol (SDP), again
eliminating the need for new technical training
or expertise.

Figure 4: SIP Operation in Proxy Mode

Site 1

Endpoint 

1@Site 1

Site 2

Location

Server

Client 2

@Site 2

Proxy

Endpoint 2

INVITE

Endpoint 2

@Site 2

Client 2

@Site 2

INVITE

Endpoint 2

@Site 2

100 Trying

200 OK

100 Trying

200 OK

Ack

Ack

1

2

4

5

6

7

24

TECHNOLOGY GUIDE

background image

can run on minimalist appliances. Simple Java
applets can be developed in anywhere from a few
minutes to a few hours. Key features of Java
include: 

Network Orientation

Java applications, called applets, run on thin-

clients. Java applets are network-aware and can
open and access objects across the Internet via
URLs. The Remote Method Invocation (RMI)
feature of Java allows the building of distributed
applications. RMI-based applications can connect
to other Java applications as well as legacy
applications.

Java Naming and Directory Interface (JNDI)

provides a unified interface to multiple
heterogeneous naming and directory services
including LDAP directories. JNDI enables seamless
connectivity to these services. Developers can
build powerful and portable directory-enabled Java
applications using this industry-standard interface.

Java Database Connector (JDBC) is an application

programming interface (API) that provides cross-
DBMS connectivity to a wide range of SQL
databases. Using JDBC, an application can establish
connectivity with nearly any enterprise or service
provider database from a Java-enabled phone.

Java also features specifications and supports

products which can automate the process of
distributing new versions of applications over the
network. This includes Java Management
Extensions (JMX), the specification, and Java
Dynamic Management Kit (JDMK), Sun’s product
which implements this specification. 

Powerful APIs for Telephony and Speech Applications

Java has two APIs specially designed for

telephony and speech applications:

• Java Telephony API (JTAPI) defines interface to

access the following functional areas: call
control, telephone physical device control,

Next-Gen VoIP Services and Applications Using SIP and Java

27

ITU-T documents are written using very dense
language, which make it virtually impossible for
the uninitiated to fathom their intent. Most ITU-T
standards tend to be very complex. For example,
H.323 specification with its co-requisite protocols
runs some 700 pages compared to about 150
pages for SIP. The ITU-T specifications are not
freely available and have to be purchased. As of
February 2001, you could not even buy the H.323
specifications from the ITU-T bookstore because
ITU-T still had not made them available for
purchase.

In contrast, the Internet standardization process is

geared toward rapid innovation. It has an open and
democratic process which draws architects from the
industry, academia, government, and individuals
who are experts in specific technology areas. All
Internet specifications are available for free to
anyone and can be simply downloaded from the
Internet. Lastly, the Internet standardization is
rooted in the “proof-of-concept”, i.e., there must
exist a prototype implementation for a standard to
achieve approved status. The standard documents
often include model codes to document the
standard. Additionally, almost always, the actual
code to implement a prototype is available on the
Internet for free download and use.

Java — the Applications Engine

A key element of the proposed architecture for

the next-generation IP voice services and
applications is an intelligent phone. Java is the
ideal application engine technology for intelligent
phones. Java has already proven itself as one of
the most innovative technologies fueling the
Internet innovations and Java applications that are
at the core of the contemporary Web-pages. 

Java applications do not reside permanently on

thin-clients, thus, do not consume any resources
on the phone when not needed. They are typically
designed with very small footprints so that they

26

TECHNOLOGY GUIDE

background image

processor that is running Java runtime environment.
Consequently, a Java applet written for an IP phone
appliance can run without modification on a PC-
based softphone supporting Java.

Ease of Development 

Sun makes developing applications quick and

easy with great tools in their Java Development
Kit. In addition, Java is supported by numerous
tools, components, and applications that are
available from many vendors. In fact, many are
available for free on the Internet. These tools
include application and user interface (UI)
components, authoring and workflow tools, and
integrated development environments. A wide
variety of Java training options ranging from
classrooms to web-based are also available. Lastly,
due to Java’s tremendous popularity, Java software
engineers are readily available on permanent or
contract basis to assist in development.

Next Generation IP Voice
Services and Applications

SIP and Java also enable a whole new

generation of applications which are impossible
with other telephony architectures. These
applications can generally be divided into three
categories:

• Personal productivity applications

• Occupation specific and industry specific

applications

• Web-telephony integration (WTI) applications

Listed below are a few examples of each.

Next-Gen VoIP Services and Applications Using SIP and Java

29

media services, and telephony administrative
services. JTAPI functions can be used with both
wired and wireless phones and its core
functions can be extended to build applications
such as call logging and tracking, auto-dialing,
screen-based telephone applications, call
routing applications, automated attendants,
interactive Voice Response (IVR) systems call
management center, voicemail, etc.

• Java Sound API (JSAPI) allows developers to

incorporate speech technology into user
interface for their Java applets and applications.
This API specifies a cross-platform interface to
support command and control recognizers,
dictation systems and speech synthesizers. 

Security

Java has a built-in security framework or

“sandbox” that can protect basic phone operation
like making and receiving calls from rogue or
misbehaving applets. Java enables the construction
of virus-free, tamper-free appliances like phones. It
also incorporates authentication techniques based
on public-key encryption. Java’s security features
also allow enterprises to control access to
resources via policy-based permissions. 

Support for a Wide Variety of Devices and User

Interfaces

Java applets can run on virtually any platform

due to their platform independence. A Java applet
can be written once and run on virtually any
operating system including cell phone OS, HP UX,
IBM AIX, Palm OS, Sun Solaris, VxWorks, Microsoft
Windows, and various other varieties of Unix and
Linux systems. To enable a Java application to
execute anywhere on the network, the Java
compiler generates an architecture-neutral object file
and the compiled code is executable on any

28

TECHNOLOGY GUIDE

background image

Automated conference calling — create conference
call appointments in Microsoft Outlook. The
application would automatically set-up the
conference call at the specified time. 

Distinctive rings — play unique rings from any
sound file based on caller ID or personal directory
information. Separate rings could be set up for a
boss, spouse, kids, or anyone else. 

Industry and Occupation-Specific
Applications

Telecommuters — get all office telephony
functionality at home — extension dialing, call
transfer, intranet intercom, call billing, etc.

Consultants — start the “clock” automatically for
time accounting or billing when picking up the
phone or dialing the number of a client using
caller ID or contact database information.

Sales reps — integrate voice and data information
collected during a call with sales force automation
applications such as ACT or Goldmine, or an ASP
like sales.com.

Public relations — click-to-dial personalized and
up-to-date press, analyst and vendor contact lists,
and track and report time on the phone by client
using a public relations ASP like mediamap.com.

Web-Telephony Integration (WTI)
Applications:

Auction site for purchasing agents of electronic
components 
— create a live audio auction for
excess DRAM inventory and use the “heat” of a
real-time event to pump-up prices and the
auctioneer’s commission. Use Java applets on the

Next-Gen VoIP Services and Applications Using SIP and Java

31

Personal Productivity Applications

Electronic business cards — send an enriched
electronic virtual business card (vCard) including
photo and audio file automatically with every call
as caller ID information (or selectively during the
middle of call). This information can be added
into any personal contact database such as
Microsoft Outlook, or a corporate CRM, or a
Supply Chain Management (SCM) database with
the push of a button.

Presence and instant messaging — use an instant
messenger service to determine when
geographically distributed colleagues are available
for a quick conference call with a customer.
Simply click or automatically “camp on” your
“buddy list” to create the conference call. 

Call filters — have every call from that very
important customer ring at every phone —
business phone, cell phone, home phone, vacation
phone, etc. The call will get completed to the first
device from where the user picks up the call.

Phone book — use multiple phone books —
corporate, personal, Internet, etc., on the phone
and simply point to an entry to make the call. The
phone books can be synchronized with the data
on a PC or any server.

Personalized music on-hold — play personalized
announcements or music from a favorite MP3
recording or Internet radio station while callers are
on hold. 

Voice tag elimination — deliver customized
messages to people trying to contact busy contacts
and eliminate phone tag. 

30

TECHNOLOGY GUIDE

background image

Summary

The Web has revolutionized the world of

business. Traditional telephony, however, cannot
fulfill the needs of the emergent e-business model.
The traditional telephony model is constrained by
an inflexible and inefficient architecture based on
centralized processing and the dumb terminal.
This environment inhibits innovation, is nearly
impossible to use, and simply perpetuates the old,
cumbersome, and limited functionality services. 

IP telephony needs to embrace the Web

architectural model in order to achieve rapid and
cost effective innovation. Old definitions of
“enhanced” services and features do not come
anywhere near even the simplest applications made
possible by technologies such as SIP and Java. 

SIP, coupled with Java, can bring the same

revolutionary innovations and mindset to the
world of IP telephony that the Web has brought to
IT and the data world. 

Next-Gen VoIP Services and Applications Using SIP and Java

33

phone to manage the bidding process and to track
who “raised a hand” to bid first, etc. 

Virtual call center ASP — support the integrated
voice and data requirements of call center agents
working from their homes. 

Airlines reservations — use a Java applet to
visually display interactive voice response (IVR)
options rather than forcing users to wait through
very long recorded instructions and go through
multi-level menus requiring the use of a telephone
keypad. 

32

TECHNOLOGY GUIDE

background image

IVR:

Interactive Voice Response, a system used for
generating voice prompts and menus and for
accepting and processing user responses.

JTAPI:

Java Telephony API, an extension to Java that
provides telephony functions such as call control. 

JSAPI:

Java Speech API, an extension to Java that
provides functions for controlling dictation
systems and speech synthesizers

JNDI:

Java Naming and Directory Interface, an
extension to Java that provides a unified
interface to multiple naming and directory
services.

Megaco:

Media Gateway Control, a VoIP protocol jointly
developed by ITU-T and IETF. It uses softswitches
and gatekeepers for central control of calls and
conferences.

MGCP:

Media Gateway Control Protocol, a VoIP protocol
developed by and IETF. It uses softswitches and
gatekeepers for central control of calls and
conferences.

MIME:

Multipurpose Internet Mail Extensions, an
Internet standard used for encapsulating e-mail
messages in clear text. 

PBX:

Private Branch Exchange, a customer premise
based telephone switch for intra-campus and
outside telephone calls.

PSTN:

Public switched Telephone Network, a general
reference to telephone networks using circuit
switching and time division multiplexing.

Q.931:

An ITU-T Call control protocol for ISDN, also used
in H.323. It defines procedures for setting up and
clearing calls.

Next-Gen VoIP Services and Applications Using SIP and Java

35

API:

Application Programming Interface, a set of
programming functions and calls supported by a
language or a software product. APIs are used by
software developers to develop programs in a
specific language or to enhance or extend the
capabilities of a product. 

ASN.1:

Abstract Syntax Notation 1, an object-oriented
language used by various architectures such as
OSI, ITU-T, and SNMP to define objects including
data structures.

ASP:

Application Services Provider, a service provider
that provides applications over a network with a
usage-based fee.

CLASS:

Custom Local Area Signaling Services, services
such as caller ID and ring back provided by a
telephone company. Devices in the telephone
central office that provide such services are
called CLASS switches.

CPU:

Central Processing Unit, the arithmetic and logic
unit in a computer. Examples include the Intel
Pentium family, the AMD Atheon, and the IBM
RISC processors.

CRM:

Customer Relationship Management software,
used with application such as ACT or Goldmine to
keep track of customer contacts and sales
information. 

H.323:

An ITU-T specification for multimedia
conferences over IP for LAN attached stations. It
is a peer-to-peer protocol as opposed to MGCP
and Megaco which require central control

HTTP:

Hyper Text Transfer Protocol, used for encoding
and transferring Web objects from Web servers
to Web browsers.

GLOSSARY

34

background image

SIP:

Session Initiation Protocol, IETF standard for
peer-to-peer multimedia sessions and IP
telephony. An alternative to the ITU-T H.323
protocol.

VoIP: 

Voice over IP, a general reference to several
technologies and protocols that allow voice
telephony implementation over IP networks.
Examples of components and technologies that
enable VoIP include codecs, IP PBXs,
softswitches, gateways, H.323, SIP, MGCP, and
Megaco.

Next-Gen VoIP Services and Applications Using SIP and Java

37

RAS:

Registration, Admission, and Status, a component
of H.323, defines procedures whereby users can
register themselves with a gatekeeper as a
preliminary step to setting up a call.

RMI:

Remote Method Invocation, a component part of
Java, allows building of distributed applications
that can connect to other Java applications as
well as legacy applications.

RTCP:

RTP Control Protocol, control protocol for RTP
that allows multimedia session partners to
monitor the quality of their sessions. 

RTP:

Real-time Transport Protocol, an IP standard for
encapsulating multimedia streams for
transmission over IP networks. It includes
information such as packet timestamps to help
implement quality of service for a session. 

SCCP:

Skinny Client Control Protocol, a Cisco proprietary
protocol for voice over IP that uses central
control with gatekeeper-like functions.

SCM:

Supply Chain Management, used in reference to
application programs used for managing
purchases and suppliers.

SDP:

Session Description Protocol, an IETF standard to
advertise multimedia conferences. SDP is
intended for describing multimedia sessions for
the purposes of session announcement, session
invitation, and other forms of multimedia session
initiation. 

SFA:

Sales Force Automation, used in references to
application programs used for managing sales
activities such as capturing customer contact
information, generating contracts, and generating
order forms.

GLOSSARY

36

background image

cases of a multicast conference, a full-mesh
conference and a two-party “phone call”, as well
as combinations of these. Any number of calls can
be used to create a conference.

Call

A call consists of all participants in a conference

invited by a common source. A SIP call is
identified by a globally unique call-ID.

SIP Components

User Agent Clients and Servers

A user agent is a program that runs on a SIP

device (e.g., the phone). It contains a client
function and a server function. 

The user agent client (UAC) is a program that

initiates SIP requests such as initiating a call. A
UAC is also known as the calling user agent

A user agent server (UAS) is a program that

receives SIP requests such as an incoming call and
sends back responses to those requests. A UAS is
also known as the called user agent.

Figure 7: SIP clients and servers

SIP Servers:
Proxy
Redirect
Location
Registrar

User Agent

Client

User Agent

Server

User Agent

Client

User Agent

Server

Next-Gen VoIP Services and Applications Using SIP and Java

39

Session Initiation Protocol
(SIP) Concepts and Operation

SIP is an Internet protocol defined under

Request for Comment 2543 (RFC 2543). SIP is not
just for voice communications — it supports data
and multimedia in its core specification. 

In TCP/IP terminology, as shown in figure 6, SIP

is an application level protocol and runs over UDP
but may use TCP. SIP is based on existing and
well-understood Internet protocols and extends
them to support IP telephony. 

SIP Concepts 

Session

A SIP session is a multimedia session consisting

of a set of multimedia senders and receivers and
the data streams flowing from senders to receivers.
Session is the basic building block in SIP. All calls
and conferences are established by setting up
sessions among users.

Conference

A conference is a multimedia session, identified

by a common session description. A conference
can have zero or more members and includes the

Figure 6: SIP and other Internet Protocols

Gopher

Kerb

SMTP

Telnet

FTP

SNMP

RPC

SIP

TCP

UDP

IP

LAN or WAN Interface

APPENDIX A

38

background image

rwhois, LDAP, multicast-based protocols or
operating-system dependent mechanisms to
actively determine the end system where a user
might be reachable.

SIP Addressing

SIP uses traditional Internet names as addresses,

which consist of a user name and a domain name.
This is an important issue because it means that
the existing Internet naming, addressing, and
routing services can process SIP addresses without
modifications. Examples of SIP addresses include:

SIP:user01@bigcorp.com

SIP:user@25.16.10.8

SIP:1-212-555-1212@business.com

These addresses are similar to HTTP URL

addresses except that they start with SIP instead of
HTTP. The first example shows a user being
identified via a typical e-mail address. The second
example shows an address where the IP address
of the destination is known. The last example
shows how we could use a phone number-like
address under SIP. 

The major advantages of this addressing scheme

are:

• It invents no new directory structure and can

be processed by existing IP servers

• Users can use familiar e-mail or URL addresses

to make phone calls and have one less thing to
remember, the phone number.

Domain Name Services (DNS)

DNS is a standard Internet service to convert

user names, e.g., user01@bigcorp.com into IP
addresses, e.g., 172.30.10.20, that can be used for
finding user locations and routing calls. Because
SIP uses standard IP naming and addressing, we
are able to use existing, standard DNS services for
SIP without any modification.

Next-Gen VoIP Services and Applications Using SIP and Java

41

SIP Servers

Location Server

A location server is used to obtain information

about a callee’s possible location. A location is the
IP address of the domain where a user is located.
To locate a user, the name of the user is sent to
the location server and the location server returns
zero or multiple locations (IP addresses orf
domains) where a callee may be found. If the
caller already knows the IP address of the
destination server, the caller can directly contact
the callee’s UAS. 

Proxy Servers

A proxy server is an intermediary program that

acts as both a server and a client for the purpose
of making requests on behalf of other clients.
Requests are serviced internally by a proxy server
or forwarded, possibly after translation, to other
servers. A proxy interprets and, if necessary,
rewrites a request message before forwarding it.

Redirect Server

A redirect server is a server that accepts a SIP

request, maps the address into zero or more new
addresses and returns these addresses to the client.
Unlike a proxy server, it does not initiate its own
SIP requests. Unlike a user agent server, it does
not accept calls.

Registrar

A Registrar is a server that accepts REGISTER

requests. A client uses the REGISTER request to let
a proxy or redirect server know the location
where the client can be reached. It provides a
means whereby users can register their locations
with a SIP server dynamically. As users move to
different locations, they can register their new
locations with the local location server.

To supplement information obtained through

user registrations, a location server may also use
one or more TCP/IP protocols, such as finger,

APPENDIX A

40

background image

When the callee sends a response to the INVITE

request agreeing to participate in the call, the
caller sends an ACK to confirm callee’s response.

Call Setup Using A Proxy Server

To initiate a SIP call, a caller first locates the

appropriate proxy server and then sends a SIP
invitation request to the proxy server. The location
of the proxy server is locally configured on the
user station. The proxy server can also be
discovered automatically by the caller using a
variety of mechanisms such as DHCP options, DNS
SRV and others. Instead of directly sending the call
to the intended callee, the proxy server may
redirect the SIP request or trigger a chain of new
SIP requests to other proxies or location servers. 

Figure 5 shows detailed flows for SIP call setup

using a proxy server and are describe below:

1. Endpoint1@Site1 sends an INVITE request for

Endpoint2@Site2 to the proxy server. 

2. The proxy server contacts the location service

for Endpoint2.

3. The proxy server receives a more precise

location for Endpoint2 as Client2@Site2 from
the location server.

4. The proxy server issues an INVITE request to

the address(es) returned by the location
service. The INVITE request carries a Call-ID.

(Upon receiving the INVITE request, the called
user-agent alerts the user by generating a
phone ring).

5. The called user agent returns a 100 Trying

response indicating that it is processing the
INVITE request.

6. The called user agent returns a 200 OK

response to indicate successful processing of
the INVITE request. 

Next-Gen VoIP Services and Applications Using SIP and Java

43

SIP Messages

SIP messages include SIP methods and

responses to the methods. These are listed in
tables 5 and 6.

SIP Message Encapsulation — MIME 

Multipurpose Internet Mail Extensions (MIME) is

the Internet standard for describing different types
of content on the Internet, including video and
image types. It is already used by HTTP for
composing Web pages and by e-mail systems for
encoding e-mail messages. SIP uses this well-
established standard for encoding information,
eliminating the need for inventing a new
technique for encoding voice and multimedia over
the Internet. 

SIP Call Setup

SIP is inherently capable of carrying voice,

video, and multimedia calls. In the examples
below, the setup flows remain the same
irrespective of the type of the call. In these
scenarios a call set up is illustrated where a caller
knows the name but not the IP address of a
callee, necessitating the use of a SIP server. If the
caller knew the IP address of the callee, the caller
would not need services from the SIP servers.
With a callee’s destination IP address known, the
caller’s user agent client only needs to select the
protocol (UDP by default), port (5060 by default)
and IP address of the SIP user agent server to
which the INVITE request should be sent.

A successful SIP call setup consists of two

messages, an INVITE followed by an ACK. The
INVITE request asks the callee to join a particular
conference or establish a two-party conversation.
It also includes information about the media types
and formats that are allowed for the session. If the
callee wishes to accept the call, it responds to the
invitation by returning a similar description listing
the media and format it wishes to use. 

APPENDIX A

42

background image

3. The location server returns information that this

client can be found at Site3.

4. The redirect server forwards precise location

information to the calling user agent using a
302 Moved Temporarily message: Contact
Client2@Site3

5. The calling user agent acknowledges the

information with ACK

6. The calling user agent sends an INVITE request

directly to the called user agent.

7. The called user agent returns a 100 Trying

response indicating that it is processing the
request.

8. The called user agent returns a 200 OK

response to indicate successful processing of
the INVITE request. 

9. The calling user agent sends an ACK to

complete the handshake. The call is in now
place.

Next-Gen VoIP Services and Applications Using SIP and Java

45

7. The calling user agent sends an ACK to

complete the handshake. The call is now in
place. 

Call Setup Using Redirect Server

Again we assume that the IP address of the

caller is not known to the caller’s agent, thereby,
necessitating services of the local SIP server, a
redirect server in this case. The key difference
compared to the proxy server is that the redirect
server cannot initiate an INVITE request.

The flow of requests and responses for figure 8

is as follows:

1. Enduser1@Site1 sends an INVITE request to the

redirect server for Endpoint2@Site2.

2. The redirect server contacts the location server

for location information about Endpoint2.

Figure 8: SIP Operation in Redirect Mode

Site 1

Endpoint 1 

@Site 1

Site 2

Location

Server

Redirect

Server

Site 3

Client 2

@Site 3

INVITE

Endpoint 2

@Site 2

Endpoint 2

302

Moved

Temporarily

Contact:

Client 2

@Site 3

Site 3

Ack

INVITE

Client 2 @Site 3

100 Trying

200 OK

Ack

APPENDIX A

44

background image

Next-Gen VoIP Services and Applications Using SIP and Java

47

46

NOTES

background image

Telephonic no longer rhymes with moronic.

Pingtel xpressa,

the world’s first Java

-based IP phone, does just

about anything a clever Java programmer could dream up.

To see what your Java colleagues have taught our phone to do

already, go to www.pingtel.com/payphone now and check out our

App Dev Zone.

A good idea of your own and who knows?

You just might get rich. Or famous. Real fast.

For Java Developers,

it’s a 

pay

phone.

background image

This Technology Guide is one in an ongoing series of
over 100 solutions-focused Guides. These Guides assist
IT professionals in making informed business decisions
about specific aspects of technology development and
strategic deployment.

The Technology Guide Series

®

offers a broad array of

titles, each presenting objective information and practical
guidance in a non-biased, “easy-to-understand” style
and tone. Our editorial writing team has many years of
experience in IT and communications technologies, and
is highly conversant in today’s emerging technologies.

The Technology Guide Series and techguide.com are
supported by a consortium of leading technology
providers. The Sponsor has lent its support to produce
and publish this Guide.

This Guide, as well as the entire Technology Guide
Series, is made available to view and print at no charge
by visiting techguide.com.

produced and published by

Over 100 Technology Guides in 
the following categories:

Network Management

Internet

Enterprise Solutions

Network Technology

Software Applications

Security

Convergence/CTI

Telecommunications