In the early 1990s voice over Internet Protocol
(VoIP) first emerged as an Internet hobbyist application for carrying voice
conversations over the Internet. The technology was crude, and the quality
of the voice was poor. Over the past decade, the technology has improved
dramatically, and standards have been created to support commercial use of
what is generally referred to as voice over packet, where the packet could
be IP packets, ATM cells or frame relay cells. Most recently, technology
improvements and a better economy has prompted a resurgence and substantial
deployments of VoIP.
There are some very basic reasons why legacy circuit-switched networks are
being replaced by voice-enabled, packet-switched networks. As a matter of
pure momentum, data networks are expanding at a much faster rate than
traditional circuit-switched voice networks. However, voice network services
carry a much higher premium than data network services. Along with this
expanding data network capacity comes the development of very sophisticated
digital voice processing technologies for critical functions such as voice
compression and echo cancellation. With the data networking bandwidth and
voice processing technology in place, tremendous efficiencies can be gained
from converging voice and data (and video) using tightly managed networking
protocols, such as greater bandwidth utilization, consolidated management
systems, and greater sharing of hardware sources.
However the most compelling reason for the growth in voice over packet
networks is the ability to quickly and cheaply deploy revenue-generating
services by leveraging existing packet-switched networks and the powerful
application deployment environment. The so-called service creation layer �
made up of open, layered network protocols like IP, ATM, and voice
application protocols such as Media Gateway Control Protocol (MGCP) and
Session Initiation Protocol (SIP) � affords this deployment environment.
Under the legacy public-switched telephone network (PSTN), voice
communications services have been limited because of the proprietary,
connection-oriented nature of the CLASS 5 and Private Branch Exchange (PBX)
controlled networks. However softswitched IP-based voice networks open the
doors to innovative voice applications and services. For example, today,
enterprises purchase expensive PBXs to switch calls and provide features
such as voice mail, conferencing, and auto attendant. With a softswitched IP
voice network, service providers can deliver the same services and more to
enterprise subscribers and eliminate the need for a large capital investment
in hardware and maintenance expenses.
A critical component of the converged network is the media gateway, which
allows heterogeneous packet voice networks utilizing different protocols and
voice encoding formats to interoperate with the traditional TDM voice
network.
VOICE PROCESSING FUNCTIONS
Vocoding
Voice over packet processing encompasses several functions. Voice coding (or
vocoding) is the sophisticated process of changing the digital
representation of voice from one format to another. Nearly all vocoders
modify the digital representation of voice from a linear 64kb/s Pulse Code
Modulation (PCM) format to a lower bit-rate format. This is called
�encoding,� and it is typically how digital voice is represented on a
converged voice over packet network. When digital voice must be converted
back to the linear PCM format, the decoder portion of the vocoder is
utilized. Decoding is performed when the voice is �at the end of the line�
and must be converted into an audible analog signal or when it is performed
as the first half of the transcoding process, which is the process of
converting from one compressed vocoder format to another.
The four most common vocoders used in voice over packet networks are G.711,
G.726, G.723.1 and G.729A. There are also a number of vocoders commonly used
in wireless networks, such as Enhanced Full Rate (EFR), Adaptive Multi-Rate
(AMR), and Enhanced Variable Rate CODEC (EVRC). Different media gateway
applications require different vocoders depending upon network bandwidth and
standards specifications.
Echo Cancellation
�Line echo� is an annoying side effect caused by impedance mismatches in the
two-to-four wire hybrids present in CLASS 5 switch local loop interfaces. If
you�ve made an international phone call and heard an echo of your own voice,
you�ve experienced line echo. To eliminate this, echo cancellation software
buffers PCM voice data in one direction to cancel the echo coming back in
the other direction. The amount of PCM data that must be buffered is
determined by the echo�s distance from the media gateway, measured in
milliseconds. This is defined as the echo delay path, which is the roundtrip
time it takes for the echo to come back to the media gateway. The longer the
echo delay path required to cancel the echo, the more memory space that is
re-quired and in some cases the more processing cycles that are required,
thus lowering the achievable port density.
Jitter Buffer Management
This critical function compensates for �jitter,� or variations in
interpacket arrival times introduced by connectionless and statistically
multiplexed packet networks that provide data communications without
guaranteed bandwidth allocation. While effective jitter buffer management
can compensate for the jitter effect of the network, it comes at a cost �
increased latency. The greater the amount of jitter, the larger the jitter
buffer should be to effectively remove the jitter, and subsequently, the
more latency that is introduced. The trick to efficient and effective jitter
buffer management is to adapt the size of the jitter buffer as the jitter
itself improves or worsens.
Tone Generation & Detection
Tones are the legacy method for phone equipment to communicate. There are
several classes of tones that must be supported for generation and/or
detection. Dual Tone Multi-Frequency (DTMF) tones were originally created
about thirty years ago to allow telephones to transmit phone number digits
and service requests to the CLASS 5 switch, which provides local phone
service. These tones were subsequently also used for end-user signaling to
interactive voice response (IVR) and voice mail systems. Detection and
generation of these tones is a basic requirement for voice over packet
processing solutions.
Another class of tones, multi-frequency (MF) tones, was created for the
purposes of signaling on trunk lines. Detection and in some cases generation
of these tones is required for voice over packet processing solutions. The
all-familiar call progress tones indicate to a caller the status of a call,
including dial tone, busy tone, call waiting tone, and the like.
Modem Transmissions
Besides voice and signaling tones, the other very common signal transmission
on voice circuits is modem signal. Modem signals come in three basic
flavors: dial-up data modem signals, fax modem signals, and caller
identification signals. Modem signals, especially high-speed data modems and
fax modems, cannot be vocoded, especially with the lower bit-rate parametric
vocoders.
COMMON PACKET FORMATS
Real-Time Protocol
The most well known packet format and transport protocol for transmitting
real-time data on IP networks is the Real-Time Protocol (RTP). RTP is used
to encapsulate voice or other real-time data and provide information about
that data such as timestamp, sequencing, and payload format. RTP can be used
to encapsulate any one of a number of vocoder formats including G.711 � the
native vocoder format of the PSTN.
RTP can also be used to encapsulate and transport DTMF, MF, and call
progress tones as well as call supervision signaling (e.g., ABCD CAS bits).
RTP tone signaling is utilized when a low bit-rate vocoder is in operation
that would otherwise distort the original audio representation of the tone
and reduce or eliminate the ability of IVR, voice mail, and auto attendant
systems to detect those tones. RTP tone signaling, also called tone relay,
is typically used with tone detection and generation capabilities.
ATM Adaptation Layer2
In addition to RTP, another very popular packet format for transmitting real
time, compressed voice is the ATM Adaptation Layer 2 (AAL2). AAL2 packets
are multiplexed in ATM cells. AAL2 provides many of the same capabilities of
RTP and leverages the inherent quality of service (QoS) capabilities of ATM
that are particularly useful for voice over packet networking. AAL2 supports
a number of vocoder formats in different combinations called �profiles.� It
also supports tone relay and fax relay and specific control protocols for
loop emulation applications.
COMMERCIAL APPLICATIONS
Toll Bypass
Shortly after VoIP went through its hobbyist phase, arbitrage specialists
realized that the data networks, like the Internet, were not regulated and
subsequently tariffed like the PSTN networks. The toll bypass application
was created to take advantage of this situation. Toll bypass utilizes
private VoIP networks to bypass the PSTN, particularly into certain
countries with monopolistic telecom industries. For example, a typical toll
bypass service provider would set up media gateways in the United States
connected to media gateways in other countries by a private packet network.
Subscribers to the service would call one of the media gateways through an
IVR system. The gateway would initiate packet communication with the closest
destination gateway, which would dial locally to the destination phone
number. Because the toll bypass application was a niche market it could
tolerate a voice over packet processing solution with low port density, high
cost, and lack of true PSTN feature support.
CLASS 4/Tandem Switch Replacement
The concept of toll bypass has since been adopted by long distance phone and
network providers under the category of the CLASS 4/tandem switch
replacement application. This application utilizes a voice over packet
network and media gateways to replace traditional inter-machine trunks (IMTs)
and the CLASS 4 and tandem switches that connect calls between local CLASS 5
switches. As a more mainstream application, port density is a much more
important issue, as is voice quality and support for in-band signaling
methods such as MF and the relay of those signals through RTP. To maintain
an acceptable QoS level and to be able to adhere to service level agreements
(SLAs), these service providers utilize ATM backbone networks for their
voice over packet application. Therefore, the gateways require a voice over
packet processing solution that provides conversion between TDM and RTP/UDP/IP
running over ATM.
Loop Emulation Service
Up to this point, the voice over packet applications that have been
discussed in this article implement trunking or �back haul� capabilities.
However with the further deregulation of the telecom industry, the prospect
of providing local phone service became attractive to broadband service
providers. With broadband connections, service providers could layer on
voice capabilities and become CLECs. Thus the voice over broadband gateway
application came into existence. The first generation of voice over
broadband gateways are based on the digital loop carrier (DLC) or remote
terminal (RT) model and utilize the same interfaces to the CLASS 5 switch.
These gateways support multiple integrated access devices (IADs), which are
located at the customer premise and support the convergence of voice, data,
and video on the same DSL or cable broadband connection. For a voice over
DSL broadband gateway, the voice over packet processing solution must
support processing between an interface to the CLASS 5 switch and the ATM
network.
As a local loop emulation application, the
voice over packet processing solution is often required to support functions
such DTMF tone detection, programmable call progress tone generation, caller
ID generation, tone relay and call conferencing. Many of these functions are
required to support local loop features and requirements such as 911, call
waiting, and wire tapping.
Second generation voice over broadband gateways will replace or supplement
the backend connection to a legacy CLASS 5 switch with a connection to a
softswitched IP voice network.
This will allow local exchange carriers to eliminate expensive and
proprietary legacy switches and replace them with a networked approach to
providing phone services. With softswitched architecture providing call
control functions and a basis for service creation, carriers open the door
to the deployment of innovative new services such as PBX application
services.
PORT DENSITY:
CLAIMS & REALITY
Port density is a key requirement to reduce the cost of providing voice
services to the mass market. Port density based on G.711 vocoding seems to
have been adopted by the industry as the defacto metric, however, much like
other simple metrics, �your actual density may vary.�
There are two issues to consider when evaluating the port density of a voice
over packet processing solution: echo delay path and multiple vocoder usage.
From the standpoint of port density and ap-plication complexity, employing
G.711 vocoding is extremely desirable. However one of the primary goals of
voice over packet processing applications is to stuff as many voice
conversations as possible into a given bandwidth. This is achieved through
the use of low bit-rate vocoders. In a typical application, network
bandwidth utilization is monitored and voice ports are switched from G.711
to low bit-rate vocoders to make room for more calls. As bandwidth becomes
available, active ports are switched back to G.711. Thus for most
applications, port density based just upon G.711 is not realistic.
The real question becomes how port density is
affected as some of the ports are switched to low bit-rate vocoders. The
answer lies in how processing resources are allocated for each port. In the
traditional voice over packet processing technology architecture, processing
resources and memory resources are allocated on a worst-case basis to
simplify design. In the case of processing resources allocated to vocoding,
the processing resources are typically time-sliced based on the needs of the
most compute intensive vocoder configured for a port. This means that a port
configured to support either G.711 or G.723.1 will be allocated the
processing resources required to run G.723.1 (even if the G.711 vocoder is
in operation). In addition, the fixed time slicing arrangement forces the
largest packet length sizes for all the vocoders, which has the unpleasant
side effect of either incremental jitter or delay.
Next-generation voice over packet processing solutions employ more
sophisticated methods for resource allocation, which overcome the
limitations of fixed resource allocation and deliver higher port densities
across ports running multiple types of vocoders without sacrificing voice
quality or network performance.
CONCLUSION
The voice over packet processing function is a critical component of any
network convergence application. Solutions providing this function have
evolved from piece-meal algorithms and general purpose DSPs, which sufficed
for first-generation network convergence, to highly specialized voice over
packet processor solutions that offer many key benefits for equipment
manufacturers looking to develop packet-based infrastructure and service
creation platforms.
Tim Resker is Senior Product Manager, Voice Technology Group at
Performance Technologies. Performance Technologies develops platforms,
components, and software solutions for the world�s evolving communications
infrastructure. To learn more about the company, visit them online at
www.pt.com.
[
Return
To The July 2004 Table Of Contents ]
|