Next-Gen VoIP
Services and
Applications Using
SIP and Java
This Guide has been sponsored by
Visit techguide.com
The
Technology
Guide
Series
This Guide has been sponsored by
Don’t let our sexy curves and cool colors fool you.
The internet-age Pingtel xpressa
™
phone, and its virtually
limitless Java
™
repertoire of revenue-enhancing possibilities, such
as hosted IP voice services, is a very serious money maker indeed.
To learn about the opportunities the world’s most intelligent phone
can bring you, go to www.pingtel.com/mintmoney.
Or send an e-mail to us at hostedvoiceservices@pingtel.com
and we’ll get back to you
For Service Providers,
it’s a
mini branch
of the U.S. Mint.
TECHNOLOGY GUIDE
2
Table of Contents
Abstract
4
Introduction
4
Architecture Models
6
Technology Enablers for Next Generation
Voice Services and Applications
16
Next Generation IP Voice Services
and Applications
29
Summary
33
Glossary
34
Appendix A: Session Initiation Protocol
(SIP) Concepts and Operation
38
Editorial Writing Team
ATG’s Technology Guides and White Papers are produced according to a
structured methodology and proven process. Our editorial writing team
has years of experience in IT and communications technologies, and is
highly conversant in today’s emerging technologies.
The Guide format and main text of this Guide are the property of The
Applied Technologies Group, Inc. and is made available upon these terms
and conditions. The Applied Technologies Group reserves all rights herein.
Reproduction in whole or in part of the main text is only permitted with
the written consent of The Applied Technologies Group. The main text
shall be treated at all times as a proprietary document for internal use
only. The main text may not be duplicated in any way, except in the form
of brief excerpts or quotations for the purpose of review. In addition, the
information contained herein may not be duplicated in other books,
databases or any other medium. Making copies of this Guide, or any
portion for any purpose other than your own, is a violation of United
States Copyright Laws. The information contained in this Guide is
believed to be reliable but cannot be guaranteed to be complete or
correct. Any case studies or glossaries contained in this Guide or any
Guide are excluded from this copyright.
Copyright © 2001 by The Applied Technologies Group, Inc.
209 West Central Street, Suite 301, Natick, MA 01760
Tel: (508) 651-1155, Fax: (508) 651-1171
E-mail: info@techguide.com, Web site: http://www.techguide.com
techguide.com
Visit our Web site
to read, download,
and print all the
Technology Guides
in this series.
Visit our Web site
to read, download,
and print all the
Technology Guides
in this series.
techguide.com
Software Applications
Network Management
Enterprise Solutions
Network Technology
Telecommunications
Convergence/CTI
Internet
Security
Over 100 Technology Guides in the
Following Categories:
Over 100 Technology Guides in the
Following Categories:
caller ID, etc.), cannot provide the types of features
that are needed by a contemporary business in the
age of e-commerce. The traditional business
telephony solutions are complicated, for both the
service administrators and the users. Because of the
daunting complexity of PBX and CLASS/Centrex
user-interfaces, users typically know and use only a
fraction of the total feature set.
Now imagine telephony services in the context
of the current business need. The users would still
like to use a phone for making and receiving calls
and playing voice-mail messages. However, they
would also like to have the phone appliance
integrated with a browser-based PC for managing
phone books and seamlessly interfacing with other
applications, such as customer relationship
management (CRM), sales force automation (SFA),
supply chain management (SCM), time accounting,
etc. In other words, perform tasks most suitable
for the PC on the PC and those most suitable for
the telephone using a phone appliance and have
the two devices seamlessly integrated.
Today’s telephone just cannot deal with this new
business imperative.
In contrast, the Internet and Web-based
communications have revolutionized the business
environment and user personal life-styles by their
inexpensive, standards-based innovations. We already
have data, multimedia, video, and music applications
on the Internet. The Internet is already serving as the
underpinning of critical business and IT solutions.
Just in the last few years alone the Internet and the
Web have generated more innovations than
traditional telephony has produced in its entire
history. The next frontier for the Web is to apply the
same degree of innovation to telephony.
Most market surveys have verified that IP
telephony is already supplementing traditional
telephony and it is expected that the IP telephony
architecture will ultimately replace the traditional
telephony model.
Abstract
This Technology Guide explains the unique
benefits of using the Web architectural model with
SIP and Java as the enabling technologies for next
generation IP voice services and applications.
Using the Web as a reference model for rapid
innovation, the Guide contrasts the limitations of
circuit-switched telephony and first generation
VoIP architectures with the Web model. It
summarizes limitations of centralized-processing
models such as traditional telephony, MGCP, and
Megaco as compared to peer-to-peer models such as
SIP and H.323.
This Technology Guide explains in more detail
the unique benefits of using SIP for call control
and Java for making phones intelligent. SIP is
compared with H.323 in terms of innovation,
scalability, simplicity, ease of deployment, and
standardization. The guide also includes an
explanation of SIP concepts and operation. A
description of Java features supporting new voice-
services and applications is also included.
The Guide concludes with examples of new
voice-services and applications made possible
exclusively by SIP and Java.
Introduction
Traditional telephony has hit a wall in terms of
innovation, ease of use, and cost reduction. The
core components of traditional telephony —
the terminal (telephone), PBX, the central office
switch, and the switching network — are struggling
and failing to keep up with the rate of innovations
on the Internet. The archaic telephony framework
with PBXs and Custom Local Area Signaling Services
(CLASS) switches providing Centrex and enhanced
residential services (call waiting, call forwarding,
Next-Gen VoIP Services and Applications Using SIP and Java
5
4
TECHNOLOGY GUIDE
Both models have all of their intelligence in a
centralized switch or server, which performs all of
the telephony functions such as call setup, call
forwarding, conference calling, etc. All requests,
responses, and state changes must be processed
by the central switch/server with the end-station
being a dumb terminal.
The following are the salient characteristics of
the traditional telephony environment:
• Archaic, Host-to-Dumb Terminal Architecture:
Voice service architecture has not changed for
generations. Today, PBX and Centrex services
are delivered using switches that contain all
application intelligence — just as mainframes
and minicomputers did for IBM 3270 or VT100
terminals in old computer systems.
• Dumb Terminal — The Telephone: Voice
service delivery assumes a dumb terminal in
telephony parlance — the telephone. The end-
Figure 1B: First-generation
IP telephony architectures
"call manager"
IP Centrex
Softswitch
"gatekeeper"
LAN PBX
Next-Gen VoIP Services and Applications Using SIP and Java
7
This Technology Guide explains the architecture
of the new IP telephony model using Session
Initiation Protocol (SIP) and Java. The Guide also
demonstrates the power of SIP and Java in terms
of scalability, ease of use, and innovative services
and applications.
Architecture Models
Circuit-Switched and First-Generation IP
Telephony Architectures
The traditional telephony architecture is based
on a centralized processing model. First generation
IP telephony architecture uses a Media Gateway
Control Protocol (MGCP), Megaco, or vendor
proprietary protocols such as Cisco’s Skinny Client
Control Protocol (SCCP), which also are centralized
architectures similar to the traditional telephony.
Figure 1A: Traditional circuit-switched
telephony architectures
Centrex
CLASS 5
switch
PBX
6
TECHNOLOGY GUIDE
Web Architecture
The Web represents the most successful
application architecture in history. The Web
features many intelligent servers located
everywhere on the network and an intelligent,
browser-based client device (a PC or a low cost
Internet appliance). It is the client device, not the
server, that both initiates and controls all
communications with the server. When a user
simply clicks on an icon to access an application,
the browser pulls content in the form of HTML
and applications (Java, Java script, Flash, Active X,
etc.) from the server and runs them on the PC.
There is a complete disaggregation of services in
the Web model. Not only do the services come
from different servers, they may be provided by
different and multiple service providers. Some of
the examples (shown in figure 2) include Yahoo
for news; Amazon for shopping; MSN for instant
messaging; ASP services (such as Corio) for
customer relationship management (CRM), sales
force automation (SFA), enterprise resource
planning (ERP); and MP3.com for music. An
enterprise can outsource as few or as many
services as suits its business model.
Key characteristics of the Web architecture
include:
• Intelligent end devices (clients)
• Distributed, intelligent servers (no central
switch or server for services)
• An open architecture leading to innovation,
rapid application development, and lower costs
Next-Gen VoIP Services and Applications Using SIP and Java
9
user interface for these services on the dumb
telephone requires non-intuitive flash
sequences and star codes. No options exist for
making telephony features easier to use and
increasing user productivity.
• Hardware Specific Software: The voice features
reside in software that is usually hardware-
specific and/or proprietary. This environment
requires highly-specialized software engineers
that are expensive and hard to find. Even
simple software modifications require the
extensive regression testing of feature
interaction.
• Limited Next-Generation Platforms: Next-
generation voice service platforms still fall short
of business needs. Most first-generation IP
telephony systems, for both service providers
and enterprises, do exploit IP for transport and
some feature a Java or XML software
environment. However, this “open”
environment is not easily made extensible by
anyone other than the vendor or possibly a
service provider; certainly not the enterprise or
an independent software vender with a great
idea. These systems, consequently, still
perpetuate the same 1960’s host-terminal
architecture with a dumb telephone as the
endpoint:
• The IP PBX is a host computer with all the
smarts driving dumb IP phones.
• VoIP gateways, softswitches, and their
feature servers are merely physically
distributed mainframes talking to dumb
terminals.
8
TECHNOLOGY GUIDE
voice-world are solely defined and developed by
PBX and CLASS switch manufacturers, just as
mainframe applications were defined by the
vendors.
The PBX and CLASS switch vendors, their ideas,
their bureaucratic practices, and their business
motivations have held innovation in the voice-
world hostage. Voice features reside in software on
the switch that is hardware-specific and vendor-
specific. It is a proprietary environment that is not
openly extensible. Even modest new functions
require the onerous regression testing of feature
interaction.
The centralized, closed-software environment
offers no way for enterprises to add their own
innovations or enhancements to telephony
features, let alone individual users or software
developers with really good ideas. Some features
are impossible to implement because of the dumb-
telephone as the endpoint. Consequently,
innovation is and will remain dead, especially
when compared to the revolutions on the Web.
Web
Innovation on the Web occurs at the edges of
the network, where anyone — businesses and
individuals can create Web sites that are
immediately open for other users to interact with.
On the Web, in contrast to traditional telephony, a
new page or “feature” can be created in a few
minutes. More importantly, the Web page can be
conceived, created, delivered and personalized by
anyone — yahoo, e-bay, GE, a company, an
individual, their kids or their grandparents. Several
million Web sites are in existence today, up from a
few thousand in 1993. These sites satisfy
everyone’s personal and business needs for news,
buying, entertainment, chat, sports, sex, etc.
regardless of gender, race, religion, ethnic
background, industry and occupation.
Amazon.com would not have happened if the
world needed to rely on the data communications
Next-Gen VoIP Services and Applications Using SIP and Java
11
Comparing the Architectures
The Web has revolutionized the world of
business. It has enabled a whole new business
paradigm in the form of e-business, portals,
e-tailers, and collaborative applications. The Web
has enabled businesses to reach business partners
and customers worldwide with a click of the
mouse. Telephony services must change
dramatically to become a functional member of
this business revolution. However, given their
limitations, it is virtually impossible for the current
telephony architectures to satisfy emerging
requirements.
Innovation
Traditional and First-Generation IP Telephony
The telephone was invented more than 125
years ago. Since then it has enabled people to talk
and do only a handful of other things, like use
voice mail. All of the features and services in the
Figure 2: Web application architecture
Intelligent servers
Intelligent clients
CRM/SFA
MP3.com
doubleclick.com
Virtualcart.com
MSN Instant
Messenger
amazon.com
yahoo.com
MP3
Java
Flash
Active X
HTML
Cookies
10
TECHNOLOGY GUIDE
browser’s graphical user interface means that users
do not have to memorize features as in the world
of telephony. The use of any Web site is an
intuitive discovery process, performed simply by
pointing and clicking at images and words.
Scalability and Capacity
Traditional and First Generation IP Telephony
In the telephony world, big centralized boxes
have all the smarts. Whenever the telephone, the
“terminal” in the parlance of telephone equipment
vendors, sends a flash sequence or * code, it’s the
PBX or CLASS switch that figures out what it
means. The PBX or the switch also must actively
manage each and every call. Consequently, it just
does not scale. Support for just one more user
may end-up requiring a hugely expensive
replacement or addition.
Web
A Web site, however, can support millions of
users. Scalability is achieved not only through the
connection-less nature of IP and by adding more
and bigger servers to the Web site. Scalability is
also achieved by exploiting an intelligent endpoint
— the browser-based PC. In fact, it’s the browser
software that interprets Web objects and puts a
Web page together.
For example, in accessing a typical e-commerce
site, it’s the browser, not a server, that:
• Retrieves and displays the source HTML page
and embedded product images individually
• Retrieves and runs a Java applet, Java script,
Flash, Active X or other application
components
• Retrieves and displays a dynamic advertisement
from DoubleClick.com
• Retrieves shopping cart services from a
ShoppingCart.com
Next-Gen VoIP Services and Applications Using SIP and Java
13
vendors such as Alcatel, Cisco, Lucent, or Nortel to
invent the “service” and add the features to a
router or a switch.
Ease of Use
Traditional and First-Generation IP Telephony
For most telephone users, cryptic impossible-to-
remember flash sequences and * codes are the
interface to thousands of PBX and CLASS features.
For the fortunate few with block character
displays, even IBM 3270 and VT100 terminals
appear attractive.
Users don’t know what voice features exist and
if they do, they do not know how to use them.
While most voice service platforms such as PBX
and CLASS switches offer hundreds or thousands
of features (300-400 features in a typical PBX,
3000-4000 in a CLASS 5 switch), most users
typically don’t know any more than just a few —
transfer, hold, last number redial. In research
conducted by WorldCom, 9 out of 10 executives
could not even transfer a call without resorting to
the “help scream” — “Do I dial ‘flash’ first and
then the number, or the other way around?” Trying
to set-up just a 3-party conference call over a PBX
is even a bigger nightmare. It’s no wonder that the
assisted conference calling businesses of AT&T,
Sprint and WorldCom are so big and profitable.
For many, the most difficult part of changing jobs
is learning a new phone system. “What do I dial
to get an outside line?” Consequently, for the vast
majority, ignorance is bliss, yet very expensive in
user productivity.
Web
On the Web, millions of sites with billions,
perhaps trillions, of pages can be easily navigated
by pointing and clicking at pictures or words
displayed on an intelligent, browser-based PC.
In contrast to telephony feature usage, anyone
from kids to their great grandparents can easily
discover and use any site on the Web. The
12
TECHNOLOGY GUIDE
An enterprise has the option of providing PBX
services locally through a premises-based system
device or these could be outsourced to a network-
based service. The outsourced service not only
eliminates capital costs but may actually provide
richer services than those available from a PBX.
The figure also shows some illustrative services
such as unified messaging, presence messaging,
instant messaging, and CRM integration, all of which
can be provided by separate service providers
offering best-of-breed solutions for an enterprise’s or
even an individual user’s specific requirements.
PCs and other phones are simply resources on
the network that provide services to users. In this
model, the PC may provide services for the phone
such as integration with the desktop applications
or the phone may provide services for the PC such
as causing the phone to ring and automating
conference calls in Microsoft Outlook.
Figure 3: Web architecture for next-generation
voice services and applications
Intelligent servers
Intelligent clients
Audio
Auctions
IP PBX
PSTN
gateways
CRM/SFA
Presence & IM
Unified Messaging
Phone-to-phone
data & app exchange
Java
HTML
MP3
Hosted
PBX
service
PC app
integration
Next-Gen VoIP Services and Applications Using SIP and Java
15
• Stores cookies to identify users and maintain
states
• Encrypts credit card numbers
Manageability
Traditional and First-Generation IP Telephony
An expert — the equivalent of the proverbial
rocket scientist — must perform all maintenance
and management tasks for the PBX or the switch.
Tools for managing moves/adds/changes tend to
be horrendous and, consequently, administrators
learn only the basic coping skills. This makes it
extremely costly to administer the switch.
According to some estimates it can cost as much
as $300-$500 per PBX move/add/change. For a
Centrex line, it can take weeks for a change to be
implemented by the telephone company.
Web
Self-service by users is the normal operative
model here — for registration, buying things,
personalizing info, etc.
Every office device including printers, copiers,
and now intelligent IP phones have a built-in Web
server that enables remote configuration over the
net via browser interface.
Every office device and home appliance is
becoming more intelligent and capable of running
automated diagnostics, reporting the findings, and
ordering replacements before service is disrupted.
Exploiting the Web Architecture for Next
Generation Voice Services and Applications
Figure 3 shows what telephony would look like
if migrated to a Web-like architecture. In this
model, services and applications are resources on
the network and are accessed and controlled by
the phone and not by a central-switch or a
gatekeeper. Nor does a central-switch or
gatekeeper control what the phone can do.
14
TECHNOLOGY GUIDE
Phone Intelligence Technology
An ability to support small footprint applications
is the key for incorporating intelligence in phones.
A powerful yet easy to use programming language
used widely for Web-enabling Internet appliances
is required. In addition to rich functionality for
traditional Web applications, features developed
specifically for telephony and security are
mandatory. Lastly, the language must already be
used by hundreds of thousands of programmers
worldwide in order for innovation to happen
rapidly.
Extensible, Scalable Call Control Protocol
A call control protocol is used for call related
functions such as setting up, monitoring, and
terminating calls. However, in the new IP
telephony model, the call control protocol must
differ from traditional telephony and the first
generation IP telephony protocols. For maximum
scalability, the new call control protocol must
support peer-to-peer communications whereby
two or more phones can set up and communicate
directly without requiring anything more than
locations services from a call control server. In
addition, the protocol must allow the peer-to-peer
exchange of applications and data in addition to
voice communications.
The call control protocol must support a wide
range of environments — from home-office to the
largest enterprise and from the smallest to the
largest services provider. Thus, the protocol must
be highly scalable as well as cost effective in a
diverse range of configurations. Since it is not
possible to predict all future applications of IP
telephony, the protocol must also be extensible in
order to accommodate unforeseen requirements.
Next-Gen VoIP Services and Applications Using SIP and Java
17
Technology Enablers for
Next Generation Voice
Services and Applications
Clearly, while the model in figure 3 is quite
pedestrian in the Web world, it is quite
revolutionary in the context of traditional
telephony. The components needed to implement
this model for telephony are as follows:
Intelligent Servers
These are distributed resources that interact with
intelligent clients (PCs and phones). In terms of
hardware and software, these servers are standard
Unix, Linux, and Microsoft Windows platforms.
Compared to traditional PBXs, these servers offer
choices of multiple vendors and competitive
pricing with an open applications development
environment.
Intelligent Phones
These phones should provide much more than
incoming call ringing. In order to maintain their
independence from a central switch, they must
also provide local capabilities such as call hold,
transfer, forwarding, redial, caller ID, multi-party
conferencing, and many other traditional
telephony features.
The intelligent phones should be thin-client
computing devices that can interoperate with PCs
and servers on the network. These devices must
support dynamic loading and management of
applications such as Java applets. For ease of use,
they should incorporate functions such as
graphical and audio helpers to ease the use of
traditional and next generation applications.
16
TECHNOLOGY GUIDE
H.323, the older of the protocols, was originally
designed for video conferencing over the LAN.
Since then it has been morphed and used to
support voice and video over then WAN as well.
SIP, however, was designed from the beginning for
multimedia sessions and conferences over the
WAN. Because of these differences in their design
objectives, SIP offers numerous compelling
advantages in the areas of extensibility, scalability,
and ease of deployment over H.323.
Today there are more products available
supporting H.323 than SIP. However, since its
introduction, SIP is rapidly becoming the preferred
protocol. A January 2001 survey of Voice over IP
vendors in Network World found that while 75%
of the vendors offered products based on one of
the four H.323 versions, an approximately equal
number of them were already planning to offer
SIP-based products by June 2001. However, the
more telling statistic was that less than 25% of the
vendors were planning to upgrade their products
from H.323 Version 2 to Version 3 and even fewer
to Version 4, the latest version of H.323. According
to the same survey, most vendors expected H.323
to become a legacy protocol. In contrast, the list
of vendors supporting or planning to support SIP
is growing rapidly. Service providers embracing
SIP include WorldCom, Level 3, Net2Phone, Telia,
Webley, Ibasis, LipStream, and TalkingNets as of
March 2001 with many more anticipated.
The reasons for the rapid ascendancy of SIP
become obvious when we compare it with H.323
in the areas of innovation, scalability, ease of
deployment, manageability, and the standardization
process. Appendix A provides additional details on
SIP concepts, definitions, and operation.
Next-Gen VoIP Services and Applications Using SIP and Java
19
SIP (Session Initiation Protocol) — The
Call Control Protocol
SIP introduces the benefits of the Web
architecture to IP telephony. It provides a
powerful, extensible, scalable, and easy-to-deploy
protocol for call control and media exchange.
Several standards are available for building IP
telephony solutions. These include the Session
Initiation Protocol (SIP) from the IETF; ITU-T
H.323, an ITU-T umbrella standard; Media
Gateway Control Protocol (MGCP) from IETF;
Media Gateway Control (Megaco), a joint protocol
by IETF and ITU-T; and proprietary protocols such
as Cisco’s Skinny Client Control Protocol (SCCP).
A high-level comparison of these protocols is
included in table 1.
Table1: IP Telephony standards
SIP
H.323
MGCP
MEGACO
PRO-
PRIETARY
Architectural Peer-to-peer
Peer-to-peer
Master/
Master/
Master/
Model
slave
slave
slave
Media types
Voice, video,
Voice, video,
Voice
Voice,
Voice
data
limited data
video
Network
Intra, Extra,
Intra, Extra,
Intranet
Intranet
Intranet
scope
and Internet
and Internet
only
only
only
Extensibility
High
Low
Medium
Medium
Low
Scalability
High
Medium
Low
Low
Low
Ease of
High
Low
Medium
Medium
Medium
deployment
Standardization
IETF
ITU-T
IETF
IETF and
None
ITU-T
Why SIP
Of the protocols listed in table 1, only SIP and
H.323 are peer-to-peer protocols. MGCP, Megaco
and Cisco’s proprietary SCCP represent the old
centralized model and suffer from this model’s
limitations discussed earlier. Thus, the real choice
for a protocol with Web-like benefits comes down
to one of the peer-to-peer protocols — H.323 or
SIP.
18
TECHNOLOGY GUIDE
protocols within H.323. These include Registration,
Admission and Status (RAS), Q.931 for call control,
and H.245 for transmission of non-telephony
signals on the line. As shown in the tables, SIP has
a total of 5 methods (commands) and 8 responses
and H.323 has 21 commands/messages across the
three protocols. SIP can be implemented as a
stateless protocol and does not need to maintain
any call states, which further increases scalability
of SIP. SIP also shows a substantially higher
efficiency than H.323 during call set-up by using
approximately 50% fewer messages. Figures 4 and
5 show call set-up messages for H.323 and SIP,
respectively. While H.323 requires a total 13
message exchanges, SIP requires only 7
exchanges.
SIP Methods and Response Codes
Table 2: SIP methods
SIP METHODS
INVITE
User or service is being invited to participate in a session.
ACK
Client has received a final response to an INVITE request.
OPTIONS
Server being queried about capabilities.
BYE
User agent client indicates to server to release the call.
CANCEL
Cancels a pending request.
REGISTER
Client registers address with a SIP server.
Table 3: SIP response codes
SIP RESPONSE CODES
1xx
Informational: Request received, continuing to process request.
2xx
Success: Action successfully received, understood and accepted.
3xx
Redirection: Further action required to complete request.
4xx
Client Error: Request contains bad syntax or cannot be executed
at server.
5xx
Server Error: Server failed to execute an apparently valid request.
6xx
Global Failure: Request cannot be executed at any server.
Next-Gen VoIP Services and Applications Using SIP and Java
21
Innovation
SIP enables new services and applications not
possible with H.323 (or other IP telephony
protocols) and easily empowers service providers,
application developers, and enterprises to create
unique, differentiated services and applications.
For example, SIP uses a simple text-based
encapsulation (based on the Internet standard
MIME) which enables it to transmit data and
application programs with the voice call, making it
easy to send business cards, photos, and/or MP3
encoded information during a call.
SIP also supports third-party call control through
simple applications to modify SIP messages and
enable functions such as sending office calls to a
home phone after 5:00 PM or forwarding video
calls to a PC. Lastly, SIP envisions the need to
accommodate extensions — new protocol headers,
methods, bodies and parameters, to implement
new and innovative applications. By design not all
products are required to support these extensions
(just the endpoints) servers or phones that want to
use them.
Scalability
Being peer-to-peer protocols, both SIP and
H.323 eliminate the need for central servers to
control everything. Peer-to-peer protocols reduce
costs of network and server infrastructure
equipment necessary to support a user population
of a given size.
Within peer-to-peer protocols, SIP is a much
more efficient and less complex protocol,
therefore, more scalable than H.323. H.323 is
actually an umbrella specification that includes
several protocols from other ITU-T standards.
Tables 2 – 4 cover three categories of such
20
TECHNOLOGY GUIDE
Table 6: H323/H.248 commands and responses
H.248
Command/Message
Function
Master-Slave Determination
Determines which terminal is the master and
which is the slave. Possible replies:
Acknowledge, Reject, Release (in case of a
time out).
Terminal Capability Set
Contains information about a terminal’s
capability to transmit and receive multimedia
streams. Possible replies: Acknowledge,
Reject, Release.
Open Logical Channel
Opens a logical channel for transport of
audiovisual and data information. Possible
replies: Acknowledge, Reject, Confirm.
Close Logical Channel
Closes a logical channel between two
endpoints. Possible replies: Acknowledge.
Request Mode
Used by a receive terminal to request
particular modes of transmission from a
transit terminal. General mode types include
VideoMode, AudioMode, DataMode, and
Encryption Mode. Possible replies:
Acknowledge, Reject, Release.
Send Terminal Capability Set
Commands the far-end terminal to indicate its
transmit and receive capabilities by sending
one or more Terminal Capability Sets.
End Session Command
Indicates the end of the H.245 session. After
transmission, the terminal will not send any
more H.245 messages.
Ease of Deployment
Deploying and supporting SIP is similar to
HTTP. It uses standard protocols and functions,
which already exist in the current IP networks and
are well understood by system administrators and
technical support personnel. SIP has the following
HTTP characteristics:
• Standard Internet addressing: SIP uses
standard IP addressing format for both names
and addresses, e.g., sip:username@abcorp.com
or sip:1.781.938.5306@abcorp.com
• Clear text protocol: SIP uses clear text for its
protocol encapsulation unlike H.323, which
uses binary encoding, making SIP easier to
diagnose and troubleshoot.
Next-Gen VoIP Services and Applications Using SIP and Java
23
H.323 Commands/Messages
Table 4: H.323 RAS commands and responses
RAS
Command/Message
Function
RegistrationRequest (RRQ)
Request from a terminal or gateway to register
with a gatekeeper. Gatekeeper either confirms
or rejects (RCF or RRJ)
AdmissionRequest (ARQ)
Request for access to packet network from
terminal to gatekeeper. Gatekeeper either
confirms or rejects (ACF or ARJ)
BandwidthRequest (BRQ)
Request for changed bandwidth allocation,
from terminal to gatekeeper. Gatekeeper either
confirms or rejects (BCF or BRJ)
DisengageRequest (DRQ)
If sent from endpoint to gatekeeper, DRQ
informs gatekeeper that endpoint is being
dropped; if sent from gatekeeper to endpoint,
DRQ call to be dropped. Gatekeeper either
confirms or rejects (DCF or DRJ). If DRQ sent
by gatekeeper, endpoint must reply with DCF.
InfoRequest(IRQ)
Request for status information from
gatekeeper to terminal.
InfoRequestResponse (IRR)
Response to IRQ. May be sent unsolicited by
terminal to gatekeeper at predetermined intervals.
RAS Timers and Request
Recommended default timeout values for
in Progress (RIP)
response to RAS messages and subsequent
retry counts if response is not received.
Table 5: H.323/Q.931 commands and responses
Q.931
Command/Message
Function
Altering
Called user has been alerted —”phone is ringing”.
Sent by called user.
Call Proceeding
Requested call establishment has been initiated and
no more call establishment information will be
accepted. Sent by called user.
Connect
Acceptance of call by called entity. Sent from called
entity to calling entity.
Setup
Indicates a calling H.323 entity’s desire to set up a
connection to the called entity.
Release Complete
Indicates release of call if H.225.0 (0.931) call
signaling channel is open. Afterwards, call reference
value can be reused. Sent by a terminal
Status
Responds to an unknown call signaling message or
to a Status Inquiry message. Provides call state
information.
Status Inquiry
Requests call status. Can be sent by endpoint or
gatekeeper to another endpoint.
22
TECHNOLOGY GUIDE
Standardization
The ITU-T, organized under the auspices of the
United Nations, defines traditional telephony and
H.323 standards. It is a slow moving body with a
highly political process. Participation in ITU-T
activities is limited to paid members. Most of
Figure 5: H.323 Call set-up sequence
Endpoint 1
Gatekeeper
Endpoint 2
Admission
Request
Admission
Confirm
Setup
Call Proceeding
Admission Request
Admission Confirm
Altering Connecting
Terminal Capability Set
Master/Slave Determination
Terminal Capability Set + Ack
Master/Slave Determination + Ack
Terminal Capability Set Ack
Master/Slave Determination Ack
Open Logical Channel + Ack
Open Logical Channel
Open Logical Channel Ack
Media (RTP)
Close Logical Channel
End Session Command
Close Logical Channel + Ack
End Session Command
Release Complete
Disengage Request
Disengage Confirm
Disengage Request
Disengage Confirm
Endpoint 1
Gatekeeper
Endpoint 2
1
2
3
4
5
6
7
8
9
10
11
12
13
RAS
0.931
H.245
Next-Gen VoIP Services and Applications Using SIP and Java
25
• Simple error messages: SIP uses familiar error-
messages with prefixes such as 10x, 20x, etc.
• Leverages other Internet protocols: SIP uses
other familiar Internet protocols such as MIME
and Session Description Protocol (SDP), again
eliminating the need for new technical training
or expertise.
Figure 4: SIP Operation in Proxy Mode
Site 1
Endpoint
1@Site 1
Site 2
Location
Server
Client 2
@Site 2
Proxy
Endpoint 2
INVITE
Endpoint 2
@Site 2
Client 2
@Site 2
INVITE
Endpoint 2
@Site 2
100 Trying
200 OK
100 Trying
200 OK
Ack
Ack
1
2
3
4
5
6
7
24
TECHNOLOGY GUIDE
can run on minimalist appliances. Simple Java
applets can be developed in anywhere from a few
minutes to a few hours. Key features of Java
include:
Network Orientation
Java applications, called applets, run on thin-
clients. Java applets are network-aware and can
open and access objects across the Internet via
URLs. The Remote Method Invocation (RMI)
feature of Java allows the building of distributed
applications. RMI-based applications can connect
to other Java applications as well as legacy
applications.
Java Naming and Directory Interface (JNDI)
provides a unified interface to multiple
heterogeneous naming and directory services
including LDAP directories. JNDI enables seamless
connectivity to these services. Developers can
build powerful and portable directory-enabled Java
applications using this industry-standard interface.
Java Database Connector (JDBC) is an application
programming interface (API) that provides cross-
DBMS connectivity to a wide range of SQL
databases. Using JDBC, an application can establish
connectivity with nearly any enterprise or service
provider database from a Java-enabled phone.
Java also features specifications and supports
products which can automate the process of
distributing new versions of applications over the
network. This includes Java Management
Extensions (JMX), the specification, and Java
Dynamic Management Kit (JDMK), Sun’s product
which implements this specification.
Powerful APIs for Telephony and Speech Applications
Java has two APIs specially designed for
telephony and speech applications:
• Java Telephony API (JTAPI) defines interface to
access the following functional areas: call
control, telephone physical device control,
Next-Gen VoIP Services and Applications Using SIP and Java
27
ITU-T documents are written using very dense
language, which make it virtually impossible for
the uninitiated to fathom their intent. Most ITU-T
standards tend to be very complex. For example,
H.323 specification with its co-requisite protocols
runs some 700 pages compared to about 150
pages for SIP. The ITU-T specifications are not
freely available and have to be purchased. As of
February 2001, you could not even buy the H.323
specifications from the ITU-T bookstore because
ITU-T still had not made them available for
purchase.
In contrast, the Internet standardization process is
geared toward rapid innovation. It has an open and
democratic process which draws architects from the
industry, academia, government, and individuals
who are experts in specific technology areas. All
Internet specifications are available for free to
anyone and can be simply downloaded from the
Internet. Lastly, the Internet standardization is
rooted in the “proof-of-concept”, i.e., there must
exist a prototype implementation for a standard to
achieve approved status. The standard documents
often include model codes to document the
standard. Additionally, almost always, the actual
code to implement a prototype is available on the
Internet for free download and use.
Java — the Applications Engine
A key element of the proposed architecture for
the next-generation IP voice services and
applications is an intelligent phone. Java is the
ideal application engine technology for intelligent
phones. Java has already proven itself as one of
the most innovative technologies fueling the
Internet innovations and Java applications that are
at the core of the contemporary Web-pages.
Java applications do not reside permanently on
thin-clients, thus, do not consume any resources
on the phone when not needed. They are typically
designed with very small footprints so that they
26
TECHNOLOGY GUIDE
processor that is running Java runtime environment.
Consequently, a Java applet written for an IP phone
appliance can run without modification on a PC-
based softphone supporting Java.
Ease of Development
Sun makes developing applications quick and
easy with great tools in their Java Development
Kit. In addition, Java is supported by numerous
tools, components, and applications that are
available from many vendors. In fact, many are
available for free on the Internet. These tools
include application and user interface (UI)
components, authoring and workflow tools, and
integrated development environments. A wide
variety of Java training options ranging from
classrooms to web-based are also available. Lastly,
due to Java’s tremendous popularity, Java software
engineers are readily available on permanent or
contract basis to assist in development.
Next Generation IP Voice
Services and Applications
SIP and Java also enable a whole new
generation of applications which are impossible
with other telephony architectures. These
applications can generally be divided into three
categories:
• Personal productivity applications
• Occupation specific and industry specific
applications
• Web-telephony integration (WTI) applications
Listed below are a few examples of each.
Next-Gen VoIP Services and Applications Using SIP and Java
29
media services, and telephony administrative
services. JTAPI functions can be used with both
wired and wireless phones and its core
functions can be extended to build applications
such as call logging and tracking, auto-dialing,
screen-based telephone applications, call
routing applications, automated attendants,
interactive Voice Response (IVR) systems call
management center, voicemail, etc.
• Java Sound API (JSAPI) allows developers to
incorporate speech technology into user
interface for their Java applets and applications.
This API specifies a cross-platform interface to
support command and control recognizers,
dictation systems and speech synthesizers.
Security
Java has a built-in security framework or
“sandbox” that can protect basic phone operation
like making and receiving calls from rogue or
misbehaving applets. Java enables the construction
of virus-free, tamper-free appliances like phones. It
also incorporates authentication techniques based
on public-key encryption. Java’s security features
also allow enterprises to control access to
resources via policy-based permissions.
Support for a Wide Variety of Devices and User
Interfaces
Java applets can run on virtually any platform
due to their platform independence. A Java applet
can be written once and run on virtually any
operating system including cell phone OS, HP UX,
IBM AIX, Palm OS, Sun Solaris, VxWorks, Microsoft
Windows, and various other varieties of Unix and
Linux systems. To enable a Java application to
execute anywhere on the network, the Java
compiler generates an architecture-neutral object file
and the compiled code is executable on any
28
TECHNOLOGY GUIDE
Automated conference calling — create conference
call appointments in Microsoft Outlook. The
application would automatically set-up the
conference call at the specified time.
Distinctive rings — play unique rings from any
sound file based on caller ID or personal directory
information. Separate rings could be set up for a
boss, spouse, kids, or anyone else.
Industry and Occupation-Specific
Applications
Telecommuters — get all office telephony
functionality at home — extension dialing, call
transfer, intranet intercom, call billing, etc.
Consultants — start the “clock” automatically for
time accounting or billing when picking up the
phone or dialing the number of a client using
caller ID or contact database information.
Sales reps — integrate voice and data information
collected during a call with sales force automation
applications such as ACT or Goldmine, or an ASP
like sales.com.
Public relations — click-to-dial personalized and
up-to-date press, analyst and vendor contact lists,
and track and report time on the phone by client
using a public relations ASP like mediamap.com.
Web-Telephony Integration (WTI)
Applications:
Auction site for purchasing agents of electronic
components — create a live audio auction for
excess DRAM inventory and use the “heat” of a
real-time event to pump-up prices and the
auctioneer’s commission. Use Java applets on the
Next-Gen VoIP Services and Applications Using SIP and Java
31
Personal Productivity Applications
Electronic business cards — send an enriched
electronic virtual business card (vCard) including
photo and audio file automatically with every call
as caller ID information (or selectively during the
middle of call). This information can be added
into any personal contact database such as
Microsoft Outlook, or a corporate CRM, or a
Supply Chain Management (SCM) database with
the push of a button.
Presence and instant messaging — use an instant
messenger service to determine when
geographically distributed colleagues are available
for a quick conference call with a customer.
Simply click or automatically “camp on” your
“buddy list” to create the conference call.
Call filters — have every call from that very
important customer ring at every phone —
business phone, cell phone, home phone, vacation
phone, etc. The call will get completed to the first
device from where the user picks up the call.
Phone book — use multiple phone books —
corporate, personal, Internet, etc., on the phone
and simply point to an entry to make the call. The
phone books can be synchronized with the data
on a PC or any server.
Personalized music on-hold — play personalized
announcements or music from a favorite MP3
recording or Internet radio station while callers are
on hold.
Voice tag elimination — deliver customized
messages to people trying to contact busy contacts
and eliminate phone tag.
30
TECHNOLOGY GUIDE
Summary
The Web has revolutionized the world of
business. Traditional telephony, however, cannot
fulfill the needs of the emergent e-business model.
The traditional telephony model is constrained by
an inflexible and inefficient architecture based on
centralized processing and the dumb terminal.
This environment inhibits innovation, is nearly
impossible to use, and simply perpetuates the old,
cumbersome, and limited functionality services.
IP telephony needs to embrace the Web
architectural model in order to achieve rapid and
cost effective innovation. Old definitions of
“enhanced” services and features do not come
anywhere near even the simplest applications made
possible by technologies such as SIP and Java.
SIP, coupled with Java, can bring the same
revolutionary innovations and mindset to the
world of IP telephony that the Web has brought to
IT and the data world.
Next-Gen VoIP Services and Applications Using SIP and Java
33
phone to manage the bidding process and to track
who “raised a hand” to bid first, etc.
Virtual call center ASP — support the integrated
voice and data requirements of call center agents
working from their homes.
Airlines reservations — use a Java applet to
visually display interactive voice response (IVR)
options rather than forcing users to wait through
very long recorded instructions and go through
multi-level menus requiring the use of a telephone
keypad.
32
TECHNOLOGY GUIDE
IVR:
Interactive Voice Response, a system used for
generating voice prompts and menus and for
accepting and processing user responses.
JTAPI:
Java Telephony API, an extension to Java that
provides telephony functions such as call control.
JSAPI:
Java Speech API, an extension to Java that
provides functions for controlling dictation
systems and speech synthesizers
JNDI:
Java Naming and Directory Interface, an
extension to Java that provides a unified
interface to multiple naming and directory
services.
Megaco:
Media Gateway Control, a VoIP protocol jointly
developed by ITU-T and IETF. It uses softswitches
and gatekeepers for central control of calls and
conferences.
MGCP:
Media Gateway Control Protocol, a VoIP protocol
developed by and IETF. It uses softswitches and
gatekeepers for central control of calls and
conferences.
MIME:
Multipurpose Internet Mail Extensions, an
Internet standard used for encapsulating e-mail
messages in clear text.
PBX:
Private Branch Exchange, a customer premise
based telephone switch for intra-campus and
outside telephone calls.
PSTN:
Public switched Telephone Network, a general
reference to telephone networks using circuit
switching and time division multiplexing.
Q.931:
An ITU-T Call control protocol for ISDN, also used
in H.323. It defines procedures for setting up and
clearing calls.
Next-Gen VoIP Services and Applications Using SIP and Java
35
API:
Application Programming Interface, a set of
programming functions and calls supported by a
language or a software product. APIs are used by
software developers to develop programs in a
specific language or to enhance or extend the
capabilities of a product.
ASN.1:
Abstract Syntax Notation 1, an object-oriented
language used by various architectures such as
OSI, ITU-T, and SNMP to define objects including
data structures.
ASP:
Application Services Provider, a service provider
that provides applications over a network with a
usage-based fee.
CLASS:
Custom Local Area Signaling Services, services
such as caller ID and ring back provided by a
telephone company. Devices in the telephone
central office that provide such services are
called CLASS switches.
CPU:
Central Processing Unit, the arithmetic and logic
unit in a computer. Examples include the Intel
Pentium family, the AMD Atheon, and the IBM
RISC processors.
CRM:
Customer Relationship Management software,
used with application such as ACT or Goldmine to
keep track of customer contacts and sales
information.
H.323:
An ITU-T specification for multimedia
conferences over IP for LAN attached stations. It
is a peer-to-peer protocol as opposed to MGCP
and Megaco which require central control
HTTP:
Hyper Text Transfer Protocol, used for encoding
and transferring Web objects from Web servers
to Web browsers.
GLOSSARY
34
SIP:
Session Initiation Protocol, IETF standard for
peer-to-peer multimedia sessions and IP
telephony. An alternative to the ITU-T H.323
protocol.
VoIP:
Voice over IP, a general reference to several
technologies and protocols that allow voice
telephony implementation over IP networks.
Examples of components and technologies that
enable VoIP include codecs, IP PBXs,
softswitches, gateways, H.323, SIP, MGCP, and
Megaco.
Next-Gen VoIP Services and Applications Using SIP and Java
37
RAS:
Registration, Admission, and Status, a component
of H.323, defines procedures whereby users can
register themselves with a gatekeeper as a
preliminary step to setting up a call.
RMI:
Remote Method Invocation, a component part of
Java, allows building of distributed applications
that can connect to other Java applications as
well as legacy applications.
RTCP:
RTP Control Protocol, control protocol for RTP
that allows multimedia session partners to
monitor the quality of their sessions.
RTP:
Real-time Transport Protocol, an IP standard for
encapsulating multimedia streams for
transmission over IP networks. It includes
information such as packet timestamps to help
implement quality of service for a session.
SCCP:
Skinny Client Control Protocol, a Cisco proprietary
protocol for voice over IP that uses central
control with gatekeeper-like functions.
SCM:
Supply Chain Management, used in reference to
application programs used for managing
purchases and suppliers.
SDP:
Session Description Protocol, an IETF standard to
advertise multimedia conferences. SDP is
intended for describing multimedia sessions for
the purposes of session announcement, session
invitation, and other forms of multimedia session
initiation.
SFA:
Sales Force Automation, used in references to
application programs used for managing sales
activities such as capturing customer contact
information, generating contracts, and generating
order forms.
GLOSSARY
36
cases of a multicast conference, a full-mesh
conference and a two-party “phone call”, as well
as combinations of these. Any number of calls can
be used to create a conference.
Call
A call consists of all participants in a conference
invited by a common source. A SIP call is
identified by a globally unique call-ID.
SIP Components
User Agent Clients and Servers
A user agent is a program that runs on a SIP
device (e.g., the phone). It contains a client
function and a server function.
The user agent client (UAC) is a program that
initiates SIP requests such as initiating a call. A
UAC is also known as the calling user agent
A user agent server (UAS) is a program that
receives SIP requests such as an incoming call and
sends back responses to those requests. A UAS is
also known as the called user agent.
Figure 7: SIP clients and servers
SIP Servers:
Proxy
Redirect
Location
Registrar
User Agent
Client
User Agent
Server
User Agent
Client
User Agent
Server
Next-Gen VoIP Services and Applications Using SIP and Java
39
Session Initiation Protocol
(SIP) Concepts and Operation
SIP is an Internet protocol defined under
Request for Comment 2543 (RFC 2543). SIP is not
just for voice communications — it supports data
and multimedia in its core specification.
In TCP/IP terminology, as shown in figure 6, SIP
is an application level protocol and runs over UDP
but may use TCP. SIP is based on existing and
well-understood Internet protocols and extends
them to support IP telephony.
SIP Concepts
Session
A SIP session is a multimedia session consisting
of a set of multimedia senders and receivers and
the data streams flowing from senders to receivers.
Session is the basic building block in SIP. All calls
and conferences are established by setting up
sessions among users.
Conference
A conference is a multimedia session, identified
by a common session description. A conference
can have zero or more members and includes the
Figure 6: SIP and other Internet Protocols
Gopher
Kerb
SMTP
Telnet
FTP
SNMP
RPC
SIP
TCP
UDP
IP
LAN or WAN Interface
APPENDIX A
38
rwhois, LDAP, multicast-based protocols or
operating-system dependent mechanisms to
actively determine the end system where a user
might be reachable.
SIP Addressing
SIP uses traditional Internet names as addresses,
which consist of a user name and a domain name.
This is an important issue because it means that
the existing Internet naming, addressing, and
routing services can process SIP addresses without
modifications. Examples of SIP addresses include:
SIP:user01@bigcorp.com
SIP:user@25.16.10.8
SIP:1-212-555-1212@business.com
These addresses are similar to HTTP URL
addresses except that they start with SIP instead of
HTTP. The first example shows a user being
identified via a typical e-mail address. The second
example shows an address where the IP address
of the destination is known. The last example
shows how we could use a phone number-like
address under SIP.
The major advantages of this addressing scheme
are:
• It invents no new directory structure and can
be processed by existing IP servers
• Users can use familiar e-mail or URL addresses
to make phone calls and have one less thing to
remember, the phone number.
Domain Name Services (DNS)
DNS is a standard Internet service to convert
user names, e.g., user01@bigcorp.com into IP
addresses, e.g., 172.30.10.20, that can be used for
finding user locations and routing calls. Because
SIP uses standard IP naming and addressing, we
are able to use existing, standard DNS services for
SIP without any modification.
Next-Gen VoIP Services and Applications Using SIP and Java
41
SIP Servers
Location Server
A location server is used to obtain information
about a callee’s possible location. A location is the
IP address of the domain where a user is located.
To locate a user, the name of the user is sent to
the location server and the location server returns
zero or multiple locations (IP addresses orf
domains) where a callee may be found. If the
caller already knows the IP address of the
destination server, the caller can directly contact
the callee’s UAS.
Proxy Servers
A proxy server is an intermediary program that
acts as both a server and a client for the purpose
of making requests on behalf of other clients.
Requests are serviced internally by a proxy server
or forwarded, possibly after translation, to other
servers. A proxy interprets and, if necessary,
rewrites a request message before forwarding it.
Redirect Server
A redirect server is a server that accepts a SIP
request, maps the address into zero or more new
addresses and returns these addresses to the client.
Unlike a proxy server, it does not initiate its own
SIP requests. Unlike a user agent server, it does
not accept calls.
Registrar
A Registrar is a server that accepts REGISTER
requests. A client uses the REGISTER request to let
a proxy or redirect server know the location
where the client can be reached. It provides a
means whereby users can register their locations
with a SIP server dynamically. As users move to
different locations, they can register their new
locations with the local location server.
To supplement information obtained through
user registrations, a location server may also use
one or more TCP/IP protocols, such as finger,
APPENDIX A
40
When the callee sends a response to the INVITE
request agreeing to participate in the call, the
caller sends an ACK to confirm callee’s response.
Call Setup Using A Proxy Server
To initiate a SIP call, a caller first locates the
appropriate proxy server and then sends a SIP
invitation request to the proxy server. The location
of the proxy server is locally configured on the
user station. The proxy server can also be
discovered automatically by the caller using a
variety of mechanisms such as DHCP options, DNS
SRV and others. Instead of directly sending the call
to the intended callee, the proxy server may
redirect the SIP request or trigger a chain of new
SIP requests to other proxies or location servers.
Figure 5 shows detailed flows for SIP call setup
using a proxy server and are describe below:
1. Endpoint1@Site1 sends an INVITE request for
Endpoint2@Site2 to the proxy server.
2. The proxy server contacts the location service
for Endpoint2.
3. The proxy server receives a more precise
location for Endpoint2 as Client2@Site2 from
the location server.
4. The proxy server issues an INVITE request to
the address(es) returned by the location
service. The INVITE request carries a Call-ID.
(Upon receiving the INVITE request, the called
user-agent alerts the user by generating a
phone ring).
5. The called user agent returns a 100 Trying
response indicating that it is processing the
INVITE request.
6. The called user agent returns a 200 OK
response to indicate successful processing of
the INVITE request.
Next-Gen VoIP Services and Applications Using SIP and Java
43
SIP Messages
SIP messages include SIP methods and
responses to the methods. These are listed in
tables 5 and 6.
SIP Message Encapsulation — MIME
Multipurpose Internet Mail Extensions (MIME) is
the Internet standard for describing different types
of content on the Internet, including video and
image types. It is already used by HTTP for
composing Web pages and by e-mail systems for
encoding e-mail messages. SIP uses this well-
established standard for encoding information,
eliminating the need for inventing a new
technique for encoding voice and multimedia over
the Internet.
SIP Call Setup
SIP is inherently capable of carrying voice,
video, and multimedia calls. In the examples
below, the setup flows remain the same
irrespective of the type of the call. In these
scenarios a call set up is illustrated where a caller
knows the name but not the IP address of a
callee, necessitating the use of a SIP server. If the
caller knew the IP address of the callee, the caller
would not need services from the SIP servers.
With a callee’s destination IP address known, the
caller’s user agent client only needs to select the
protocol (UDP by default), port (5060 by default)
and IP address of the SIP user agent server to
which the INVITE request should be sent.
A successful SIP call setup consists of two
messages, an INVITE followed by an ACK. The
INVITE request asks the callee to join a particular
conference or establish a two-party conversation.
It also includes information about the media types
and formats that are allowed for the session. If the
callee wishes to accept the call, it responds to the
invitation by returning a similar description listing
the media and format it wishes to use.
APPENDIX A
42
3. The location server returns information that this
client can be found at Site3.
4. The redirect server forwards precise location
information to the calling user agent using a
302 Moved Temporarily message: Contact
Client2@Site3
5. The calling user agent acknowledges the
information with ACK
6. The calling user agent sends an INVITE request
directly to the called user agent.
7. The called user agent returns a 100 Trying
response indicating that it is processing the
request.
8. The called user agent returns a 200 OK
response to indicate successful processing of
the INVITE request.
9. The calling user agent sends an ACK to
complete the handshake. The call is in now
place.
Next-Gen VoIP Services and Applications Using SIP and Java
45
7. The calling user agent sends an ACK to
complete the handshake. The call is now in
place.
Call Setup Using Redirect Server
Again we assume that the IP address of the
caller is not known to the caller’s agent, thereby,
necessitating services of the local SIP server, a
redirect server in this case. The key difference
compared to the proxy server is that the redirect
server cannot initiate an INVITE request.
The flow of requests and responses for figure 8
is as follows:
1. Enduser1@Site1 sends an INVITE request to the
redirect server for Endpoint2@Site2.
2. The redirect server contacts the location server
for location information about Endpoint2.
Figure 8: SIP Operation in Redirect Mode
Site 1
Endpoint 1
@Site 1
Site 2
Location
Server
Redirect
Server
Site 3
Client 2
@Site 3
INVITE
Endpoint 2
@Site 2
Endpoint 2
302
Moved
Temporarily
Contact:
Client 2
@Site 3
Site 3
Ack
INVITE
Client 2 @Site 3
100 Trying
200 OK
Ack
APPENDIX A
44
Next-Gen VoIP Services and Applications Using SIP and Java
47
46
NOTES
Telephonic no longer rhymes with moronic.
Pingtel xpressa,
™
the world’s first Java
™
-based IP phone, does just
about anything a clever Java programmer could dream up.
To see what your Java colleagues have taught our phone to do
already, go to www.pingtel.com/payphone now and check out our
App Dev Zone.
A good idea of your own and who knows?
You just might get rich. Or famous. Real fast.
For Java Developers,
it’s a
pay
phone.
This Technology Guide is one in an ongoing series of
over 100 solutions-focused Guides. These Guides assist
IT professionals in making informed business decisions
about specific aspects of technology development and
strategic deployment.
The Technology Guide Series
®
offers a broad array of
titles, each presenting objective information and practical
guidance in a non-biased, “easy-to-understand” style
and tone. Our editorial writing team has many years of
experience in IT and communications technologies, and
is highly conversant in today’s emerging technologies.
The Technology Guide Series and techguide.com are
supported by a consortium of leading technology
providers. The Sponsor has lent its support to produce
and publish this Guide.
This Guide, as well as the entire Technology Guide
Series, is made available to view and print at no charge
by visiting techguide.com.
produced and published by
Over 100 Technology Guides in
the following categories:
Network Management
Internet
Enterprise Solutions
Network Technology
Software Applications
Security
Convergence/CTI
Telecommunications