
June 1999
A GUIDE TO MODULAR IP TELEPHONY SYSTEMS
BY MICHAEL BAYER
If you're a regular reader of CTI�, by now you're probably convinced that IP telephony
lies somewhere in your future. Or perhaps you've already deployed voice over IP (VoIP)
technology or have begun working on an IP-based telephone system. Regardless, you have
doubtless encountered the bewildering array of terminology, acronyms, standards,
specifications, and product categories that surround IP telephony. The goal of this guide
is to provide a framework for understanding and designing IP-based telephone systems.
With the advent of computer telephony and open architectures based on off-the-shelf
computer components, customers can now build their own telephone systems using individual
products from different vendors. Interoperability among the products making up a complete
IP telephony solution should be the ultimate focus for all vendors and customers working
in this area.
TELEPHONE SYSTEMS
In order to identify the elements that make up a telephone system and the interoperability
boundaries between them, we must first be clear about what a telephone system is. A
telephone system is a collection of subsystems that provides end-to-end telephone services
and/or access to an external telephone network. Technically, a telephone system can be any
subset of the worldwide (public and private) telephone network. Pragmatically though, when
we contemplate a particular telephone system we're interested in the collection of
components that belong to a particular customer, or system owner. The size and scope of a
particular system - that is, the boundary between the functional elements inside the
telephone system and the networks external to the system - are determined by the system
owner's area of responsibility.
For example, the telecom manager for a branch office or a small business might define
their telephone system in terms of all the telephone equipment at a single location with
all services being external. On the other hand, an individual employee might view a
desktop telephone as their telephone system. And a VP might view the telephone equipment
at all corporate sites and the private network connecting them as the telephone system of
interest.
While size and scope contribute to the overall complexity of a given telephone system,
functional requirements and components determine the relevant specifications and
standards. Regardless of its complexity, a system's components fall into four functional
areas:
Switching Fabric: Responsible for moving media stream and signaling data
between the endpoints of a given telephone network. Switching is the heart of any
telephone system. Without a switching fabric there can be no telephone system.
Call Control: Responsible for managing the switching fabric and determining
how commands from various sources should be carried out. Call control is responsible for
implementing all the functions we associate with telephony: making and dropping calls,
holding, parking, forwarding, routing, and other supplementary services and features. Call
control is the "brain" of a telephone system and may be very simple or very
sophisticated.
Media Services: Responsible for terminating and processing media streams
delivered by the switching fabric. Media services include tone/pulse detection and
generation, recording and playback of media streams, manipulation of modulated and digital
data streams (such as fax and video), and functions such as text-to-speech and automatic
speech recognition. As with call control, media services encompass the resources that
perform these functions as well as software that manipulates these resources and all the
interfaces and layers in between.
Administration: Includes support for system configuration (moves/adds/changes
and customizing operation), fault monitoring, accounting and logging functions,
performance management, and security. While the simplest of telephone systems may be
"factory administered," requiring no owner customization and having no reporting
functions, the vast majority of telephone system allow for extensive administrative
control.
EXPOSING BOUNDARIES
Computer telephony has made it possible for off-the-shelf computer technologies to be used
in implementing telephone system components. This ability has resulted in a shift from
monolithic telephone systems to highly modular systems in which components come from
disparate vendors and where system owners can build telephone systems that are uniquely
suited to their requirements.
Modularity is not a direct result of computer telephony technology itself. However,
computer telephony permits the definition and implementation of interoperability
specifications that determine the boundaries between modular products. IP telephony owes
its rapid evolution to this process, as it has been made possible to implement and
innovate quickly through the use of off-the-shelf protocol stacks, drivers, operating
systems, add-in boards, backplanes, etc.
DECOMPOSITION
A key benefit of modular computer telephony is the way in which it allows the
decomposition of traditional telephone system architectures. System components that once
used to be available only as "black boxes" can be replaced with independent and
individually upgradable hardware and software components. This means:
- Vendors are able to specialize in the areas where they offer best-of-class
implementations and are able to bring their products to market more quickly.
- Resellers and systems integrators are able to add value by integrating multiple vendors'
products rather than simply reselling closed systems.
- System owners are able to deploy and grow telephone systems with the features and
characteristics they need when they need them.
The IP-based switching fabric also represents a key milestone in the evolution of
computer telephony solutions. Until recently much of the openness in computer telephony
was limited to the periphery of traditional telephone systems because at their core was a
proprietary switching fabric. Call control, media services, and administration products
could be integrated with a legacy telephone system such as a PBX, but the PBX itself was
not necessarily open.
IP-based switching fabrics typically have the potential to be open and thus allow the
complete decomposition of telephone systems while simultaneously merging telephony and
data networks. Call control, media services, and administrative functions are distributed
across an IP network, as are the applications that are able to access them. The IP-based
switching fabric utilizes this same network to connect the various endpoints of the
telephone system, including network access resources, media access resources, and
telephone stations (both station servers and PC-based phones).
A fully decomposed IP telephone system can only be implemented to the extent that:
- Interoperability specifications covering each and every component boundary (protocol,
API, and bus interface) are defined.
- Product vendors implement their product in such a way that the appropriate boundaries
are exposed.
Competing vendors are likely to design IP telephony products in many different ways.
Some vendors will build best-of-class products in a single category. Others will build
products that span a number of areas, closing the interfaces between these components
while exposing the appropriate external interfaces. Still others will build suites of
modular products and allow customers to mix and match however they like. However, before a
vendor can choose whether or not to support modularity over a particular boundary, the
appropriate specifications must be agreed upon by the industry.
STANDARDS AND SPECIFICATIONS
Specifications defining interoperability are typically published in one of three ways:
- An individual vendor attempts to use its market presence to establish a de facto
standard unilaterally or with cooperation from a handful of partners.
- A recognized standards body - such as the ISO (International Standards Organization) or
ITU (International Telecommunications Union) - publishes the specification as a standard.
- An industry organization or forum, such as the ECTF (Enterprise Computer Telephony
Forum), publishes an interoperability agreement or other specification document that has
been approved by its membership.
IP telephony product vendors must identify and choose among applicable specifications
from all of these sources. As IP telephony involves the implementation of telephone
systems using an underlying IP-based switching fabric, a complete collection of
specifications is required to cover interoperability between components handling the
switching fabric, call control, media services, and administration.
Interconnection And Telephony Standards
The telephony industry was born in 1876 with Alexander Graham Bell's invention. While
innovation in the world of telephony has been fast and furious ever since, most of the
international standards governing telephony are fairly primitive, dealing primarily with
the switching fabric for the public portions of the telephone network. (The POTS or analog
service required for fax machines, computer modems, and analog phones still represents the
vast majority of telephone network connections, and it is fundamentally unchanged over the
last hundred years.)
Switching fabric standards were initially required to allow the interconnection of the
telephone networks from various countries and the interconnection of customer premise
equipment (CPE) and private telephone systems (such as PBXs) to the public network.
However, virtually everything beyond the level of switching fabric interconnection was
consider closed and thus not subject to standardization. The evolution of telephony
standards has thus been driven by the lowest common denominator requirements of public
network operators and not by the requirements of CPE owners which are significantly more
sophisticated.
To this point in time, the formal standardization of IP telephony switching fabrics has
followed a similar path. The ITU has published the H.323 family of standards as the
collection of specifications applicable to IP telephony (Figure 3). However, this standard
is really just a starting point for vendors of commercial IP telephony products because it
only deals with a switching fabric for IP telephony and it only supports functionality
comparable to POTS.
Vendor Initiatives
Many vendors have initiated the development of specifications on their own to address the
gaps left by the official standards. TelCordia Technologies (formerly BellCore) and Level
3 independently developed protocols for modularizing network interface functions and call
control implementations. TelCordia defined the Simple Gateway Control Protocol (SGCP), and
Level 3 defined the Internet Protocol Device Control (IPDC). Later these two companies
worked together to merge these into the Media Gateway Control Protocol (MGCP).
Another example comes from the division of Cisco that was formerly Selsius Systems,
where there was a need for a protocol to allow telephone stations to be managed by call
control. In response, Selsius (now part of Cisco) developed and published the Stateless
Client Messaging protocol (SCM).
While unilaterally defined specifications often reflect current market requirements and
the specific short term needs of their authors, they are often lacking in a few elements
that would make them broadly applicable to the industry. As a result, they tend to be
controversial and are rarely adopted widely.
Cooperative Standards
Organizations with membership spanning all segments of a given industry are optimal for
both responding to market requirements in a timely fashion and for developing
specifications that meet the needs of the whole industry. In the computer telephony
industry, the organization playing this role is the ECTF. Its members include the leading
computer manufacturers, operating system vendors, PBX vendors, wireless telephone vendors,
and computer telephony resource and application developers.
The ECTF's principal role has been to build a comprehensive framework for
interoperability in computer telephony systems and to incorporate and adapt existing
specifications to complete this framework rather than to generate new specifications and
interfaces from scratch (Figure 4). The ECTF's specifications represent interoperability
agreements among the industry's vendors. They determine which models, behaviors, APIs, and
protocols are to be used.
LOGICAL ABSTRACTIONS
One last challenge for both IP telephony vendors and customers is to sort out the
differences between the logical abstractions invented to analyze interoperability issues
and the tangible architectures applicable to actual products.
For example, the H.323 standard is defined using a model consisting of
"terminals," "gateways," "gatekeepers," and "multipoint
control units" or "MCUs." H.323 defines the functional roles of each of
these components and the protocols used for certain interactions between them. But these
components are abstractions and don't necessarily correspond to actual products or
modules. H.323 terminal functionality might be implemented within a telephone, an
application running on a PC, or a software driver on a media server.
Mixing references to logical components with references to tangible components can lead
to confusion among both vendors, who must name and define the capabilities of a particular
product, and customers, who must understand how the product interoperates with other
products.
SO WHAT GOES WHERE?
At this point, we can begin to analyze each of the four functional areas in a bit more
detail, and perhaps gain a better understanding of the appropriate protocols for each area
and how they work together.
Switching Fabric Specifications
Connecting To The PSTN
Protocols for connecting an IP telephone system to public common carrier telephone
networks are well defined, as these standards represent the traditional domain of the
national and international telecommunications standards bodies.
The system components used to connect an IP telephone system to external telephone
networks are known in the H.323 specification as gateways. In practice, a piece of gateway
software is installed and runs on a machine that is designated as a gateway and that has
appropriate hardware and software for interfacing with all of the networks for which it is
acting as a gateway.
With respect to gateway connections to the circuit-switched networks, the standards for
both media streams and signaling are well defined. They include T1, ISDN, and analog
trunks, as well as SS7 signaling. With respect to VoIP connectivity over external
networks, the ITU's H.323 specification is certainly adequate and is likely to be the
dominant protocol.
On the other hand, switching fabric options for use within an IP telephone system are
still under development. Vendors must choose both the protocols to be used to deliver
media streams between telephone system endpoints as well as the corresponding signaling
protocols.
IP Media Streams
The key to media stream protocols over IP is achieving reliable, low-latency delivery of
media stream data between endpoints while simultaneously assuring that consumption of
bandwidth on the underlying IP network does not result in denial of service to any
endpoints. H.323 specifies the Real-time Transport Protocol (RTP) and the Real-time
Transport Control Protocol (RTCP) as the transport layer for audio data compressed using
one of the G.xxx series audio codecs. It also defines the Registration, Admissions, and
Status (RAS) protocol as a mechanism for regulating network use in which logical endpoints
must receive permission from logical gatekeepers to utilize network resources for a given
connection.
New QoS mechanisms for IP telephony continue to be developed by individual vendors and
other organizations; however, the protocols specified by H.323 are sufficient for the
first generation of IP telephony products and are likely to be widely deployed.
Developments to watch include new audio codecs and bandwidth reservation schemes to be
used as the basis for the gatekeeper's bandwidth management functions, and the
SIP/SAP/RTSP media transport session management mechanisms defined by the IETF (Internet
Engineering Task Force).
IP Endpoints
IP telephone systems may include telephone station endpoints of three basic types:
- PC phone stations with audio input and output in which the IP telephony endpoint is a
collection of PC-based software and hardware. The PC truly becomes the telephone and all
audio streams pass through the PC itself.
- IP telephone stations that connect directly to the IP switching fabric (e.g., through an
Ethernet connection) and are comparable to any legacy telephone station.
- Legacy telephone stations that are interfaced through station servers that act as
proxies, or gateways, between the analog, ISDN, or other wired or wireless circuits they
use and the IP telephony switching fabric.
Despite the many ongoing initiatives in the world of QoS, choices for media stream
implementation are much clearer than for signaling and low-level control. The endpoint
signaling defined by the released versions of H.323 supports the functionality commonly
found on basic analog telephones. Unfortunately this falls well short of the requirements
for most potential IP telephone system customers. As a result, there is no clear choice or
recommendation in this area.
Telephone stations are telephone system peripherals that must be controlled by the
telephone system based on call activity, commands issued by CTI applications, and direct
manipulation by users. A key feature is the ability for software (running on a PC adjacent
to the telephone, for example) to be able to initiate activity (such as an outbound call)
on any telephone station. This capability is sometimes called "third-party
control."
Telephone station features include displays of various sizes, fixed and programmable
function buttons, multiple hookswitch/speaker/microphone combinations, and lamps/LEDs of
various colors that can be set to various states. A functional standard signaling protocol
must support full control over an arbitrary telephone set. Otherwise, any IP telephone
system implementation must resort to proprietary mechanisms to support these fundamental
capabilities. In addition, gateways and station servers have additional attributes that
must be controllable.
In addition to proprietary protocols such as Cisco's SCM, prospects in this area
include an initiative of TIA's TR41.3 to define specifications for IP telephones, and
possible future enhancement of H.323.
Gateway Control
Another category of switching fabric specifications deals with control of gateway
facilities, including station servers, that are able to interconnect media streams on
circuit-switched and packet-based switching fabrics. As noted earlier, current versions of
the H.323 specifications are silent on this subject so individual vendors developed their
own. As a result vendors must choose between using the older IPDC and SGCP, or the current
MGCP. In addition, the ITU has announced that they are working on an addition to H.323
called "H.GCP," which will provide comparable capability.
Bus Management
Does IP telephony make the traditional TDM backplane obsolete? Not at all. Modular
gateways, station servers, and media servers will typically be built around TDM
backplanes. While some vendors will build closed products in these categories, the
majority are likely to be based on the ECTF H.100 and ECTF H.110 TDM bus specifications.
Media Services Specifications
While an IP-based switching fabric is at the foundation of an IP telephone system, there
is much more to a telephone system than that. A telephone system must be capable of
generating, delivering, and terminating the media streams through the switching fabric. At
a minimum, the telephone system requires media resources to detect and generate tones. The
ability to record and play back audio data is also required to support functionality such
as auto-attendant and voice mail. By implementing open interfaces to support media
services, vendors not only streamline their own development but also allow system owners
to customize the telephone system by adding their own media resources and applications.
The ECTF S.xxx series specification defines the media services framework for computer
telephony systems. The framework abstracts implementation details of call processing
hardware, switch fabrics, and configuration topology to support location-independent media
services components. This family of ECTF specifications currently includes S.100, S.200,
S.300, and S.410 .
The S.100 interface is a platform-independent application programming interface (API)
that provides application software access to computer telephony media services. S.410 is a
complementary Java language interface known as JTAPI Media. These interfaces can be used
by any software component within an IP telephone system.
For example, call control implementations use these services to detect tones indicating
commands and to generate tones to provide call progress feedback. Voice mail software uses
these services to play greetings and record messages, and auto-attendant software uses all
of these functions to interact with callers and identify a desired call destination.
System owners can mix and match applications and develop their own. The media services
architecture allows new resources to be added to the telephone system's pool of computer
telephony resources as needed.
The ECTF S.200 specification is an operating system- and transport-independent
protocol, a complement of S.100 and S.410. Where S.100 and S.410 define APIs for getting
access to computer telephony media services, S.200 defines the corresponding protocol for
access to these same services. S.200 allows independent development of all system
components that either provide or use media services.
The ECTF S.300 specification defines a service provider interface (SPI) that allows
individual computer telephony resources (both software-only and hardware-based) to be
added into a system as components. Systems built using S.300-based products can be
enhanced, scaled, extended, and upgraded as required.
Call Control Specifications
Given an operational switching fabric, the call control implementation used within a
particular IP-based telephone system determines the telephony functionality of the overall
system. It is the "brain" of the system and is therefore arguably the most
important. Call control manages the switching fabric and creates and tracks calls and all
call-associated information. In H.323-based systems, the logical gatekeeper component is
closely associated with the call control implementation.
In the world of traditional telephone systems there are as many call control
implementations as there are systems. However, the ECTF C.001 specification provides a
single comprehensive call model that serves as the basis for mapping legacy call control
implementations to CTI protocols and APIs, and for developing new call control
implementations. Vendors that are faced with developing call control software for IP-based
telephone systems are using C.001 as the basis for their development in order to maximize
product interoperability and shorten their time to market.
A significant aspect of every modern call control implementation is support for
applications that extend and customize the core functionality provided. Such applications
represent the most valuable and significant aspect of computer telephony in general and IP
telephony in particular. With access to a rich CTI interface, there is no limit to a
system owner's ability to customize a telephone system. Ultimately this ability to realize
the potential of a given telephone system will be the primary motivation for system owners
to invest in new telephony technology.
In practical terms, call control vendors must determine their support for call control
interfaces on two fronts: protocols and APIs. CTI protocols are used in IP telephone
systems to deliver call control messages across TCP/IP. CTI APIs are used to deliver these
messages between two software components on the same machine.
Published CTI protocol specifications currently available include the three Versit
protocols and ECMA CSTA Phase III (ECMA-285). The Versit protocols are all based directly
on C.001 and are designed to work with all mainstream APIs and existing call control
implementations. All three protocols have identical functionality but vary in how they are
encoded.
Versit Protocol 1 is optimized for communication between two servers (traditionally
between a PBX and a CTI server). Versit Protocol 2 is optimized for typical client-server
applications, and Versit Protocol 3 is optimized for products such as pay phones, cell
phones, desk phones, and PDAs. The CSTA Phase III protocol is a variant of Versit Protocol
1.
There are many different CTI APIs for vendors to choose from. However, the majority are
operating system specific. The leading options are:
- TAPI, the telephony API for the Windows operating system.
- JTAPI (corresponds to ECTF C.100), the telephony API for the Java language.
- TSAPI, which is available for all popular operating systems.
Call control implementations may expose a CTI interface through any combination of
published APIs and protocols. Call control implementations offer API support through
service providers (or drivers) for each client platform that communicates with the call
control server using either a proprietary or open protocol. By implementing support for
published protocols, a vendor simplifies support for multiple client types.
Administration Specifications
Administrative specifications are arguably the weakest area of standardization in IP
telephony. However, a number of specifications do provide significant coverage.
The principal protocol used in managing resources on an IP network is the Simple
Network Management Protocol (SNMP). The ECTF M.500 specification defines a complete
management information base (MIB) to be used by computer telephony products operating on
an IP network.
The ECTF M.100 specification defines an API that compliments S.100 and provides
management of configuration data, management of services, safe startup and shutdown of
computer telephony servers, access to information about service providers, and handling of
generic administration commands. Using the M.100 API (and S.200 as the corresponding
protocol) allows system owners to better centralize administration of a multivendor
solution and upgrade their administrative software independent of the rest of their
system.
There are currently no specifications specifically for standardizing configuration of
call control functions, but many vendors are opting to support administration through Web
browser interfaces. This is involves using HTTP to deliver Web pages containing HTML and
possibly JavaScript or Java applets.
Developments to watch for in this area include: protocols for managing all forms of IP
telephony stations, and protocols for collection of call detail recording and other
billing and auditing information from gateways and call control implementations.
CONCLUSION
The advent of IP telephony will be revolutionary for the telephony industry. In
combination with open specifications, IP telephony allows for the decomposition of
telephone systems into modular networks of multivendor products.
IP telephony is a beachhead for this new approach to telephone system architecture.
Other switching fabric options that allow for this type of architecture are also likely to
emerge.
At this stage the potential of IP telephony has yet to be realized, and its success
depends upon the rate at which vendors embrace interoperability specifications.
Michael Bayer is author of CTI Solutions and Systems (published by McGraw-Hill) and
president of Computer Telephony Solutions, a consulting firm dedicated to helping vendors
of computer telephony products maximize the return on their R&D investments. For more
information, please visit his Web site at www.ctexpert.com.
For more information on the specifications mentioned in this article, please visit the
following Web sites: www.ectf.org, www.itu.int, www.ietf.org,
and www.ecma.ch. |