ITEXPO begins in:   New Coverage :  Asterisk  |  Fax Software  |  SIP Phones  |  Small Cells

Publisher's Outlook
July 2004

  The Evolution Of Network Convergence Requirements
For Voice Over Packet Processing


In the early 1990s voice over Internet Protocol (VoIP) first emerged as an Internet hobbyist application for carrying voice conversations over the Internet. The technology was crude, and the quality of the voice was poor. Over the past decade, the technology has improved dramatically, and standards have been created to support commercial use of what is generally referred to as voice over packet, where the packet could be IP packets, ATM cells or frame relay cells. Most recently, technology improvements and a better economy has prompted a resurgence and substantial deployments of VoIP.

There are some very basic reasons why legacy circuit-switched networks are being replaced by voice-enabled, packet-switched networks. As a matter of pure momentum, data networks are expanding at a much faster rate than traditional circuit-switched voice networks. However, voice network services carry a much higher premium than data network services. Along with this expanding data network capacity comes the development of very sophisticated digital voice processing technologies for critical functions such as voice compression and echo cancellation. With the data networking bandwidth and voice processing technology in place, tremendous efficiencies can be gained from converging voice and data (and video) using tightly managed networking protocols, such as greater bandwidth utilization, consolidated management systems, and greater sharing of hardware sources.

However the most compelling reason for the growth in voice over packet networks is the ability to quickly and cheaply deploy revenue-generating services by leveraging existing packet-switched networks and the powerful application deployment environment. The so-called service creation layer � made up of open, layered network protocols like IP, ATM, and voice application protocols such as Media Gateway Control Protocol (MGCP) and Session Initiation Protocol (SIP) � affords this deployment environment. Under the legacy public-switched telephone network (PSTN), voice communications services have been limited because of the proprietary, connection-oriented nature of the CLASS 5 and Private Branch Exchange (PBX) controlled networks. However softswitched IP-based voice networks open the doors to innovative voice applications and services. For example, today, enterprises purchase expensive PBXs to switch calls and provide features such as voice mail, conferencing, and auto attendant. With a softswitched IP voice network, service providers can deliver the same services and more to enterprise subscribers and eliminate the need for a large capital investment in hardware and maintenance expenses.

A critical component of the converged network is the media gateway, which allows heterogeneous packet voice networks utilizing different protocols and voice encoding formats to interoperate with the traditional TDM voice network.



Voice over packet processing encompasses several functions. Voice coding (or vocoding) is the sophisticated process of changing the digital representation of voice from one format to another. Nearly all vocoders modify the digital representation of voice from a linear 64kb/s Pulse Code Modulation (PCM) format to a lower bit-rate format. This is called �encoding,� and it is typically how digital voice is represented on a converged voice over packet network. When digital voice must be converted back to the linear PCM format, the decoder portion of the vocoder is utilized. Decoding is performed when the voice is �at the end of the line� and must be converted into an audible analog signal or when it is performed as the first half of the transcoding process, which is the process of converting from one compressed vocoder format to another.

The four most common vocoders used in voice over packet networks are G.711, G.726, G.723.1 and G.729A. There are also a number of vocoders commonly used in wireless networks, such as Enhanced Full Rate (EFR), Adaptive Multi-Rate (AMR), and Enhanced Variable Rate CODEC (EVRC). Different media gateway applications require different vocoders depending upon network bandwidth and standards specifications.

Echo Cancellation
�Line echo� is an annoying side effect caused by impedance mismatches in the two-to-four wire hybrids present in CLASS 5 switch local loop interfaces. If you�ve made an international phone call and heard an echo of your own voice, you�ve experienced line echo. To eliminate this, echo cancellation software buffers PCM voice data in one direction to cancel the echo coming back in the other direction. The amount of PCM data that must be buffered is determined by the echo�s distance from the media gateway, measured in milliseconds. This is defined as the echo delay path, which is the roundtrip time it takes for the echo to come back to the media gateway. The longer the echo delay path required to cancel the echo, the more memory space that is re-quired and in some cases the more processing cycles that are required, thus lowering the achievable port density.

Jitter Buffer Management
This critical function compensates for �jitter,� or variations in interpacket arrival times introduced by connectionless and statistically multiplexed packet networks that provide data communications without guaranteed bandwidth allocation. While effective jitter buffer management can compensate for the jitter effect of the network, it comes at a cost � increased latency. The greater the amount of jitter, the larger the jitter buffer should be to effectively remove the jitter, and subsequently, the more latency that is introduced. The trick to efficient and effective jitter buffer management is to adapt the size of the jitter buffer as the jitter itself improves or worsens.

Tone Generation & Detection
Tones are the legacy method for phone equipment to communicate. There are several classes of tones that must be supported for generation and/or detection. Dual Tone Multi-Frequency (DTMF) tones were originally created about thirty years ago to allow telephones to transmit phone number digits and service requests to the CLASS 5 switch, which provides local phone service. These tones were subsequently also used for end-user signaling to interactive voice response (IVR) and voice mail systems. Detection and generation of these tones is a basic requirement for voice over packet processing solutions.

Another class of tones, multi-frequency (MF) tones, was created for the purposes of signaling on trunk lines. Detection and in some cases generation of these tones is required for voice over packet processing solutions. The all-familiar call progress tones indicate to a caller the status of a call, including dial tone, busy tone, call waiting tone, and the like.

Modem Transmissions
Besides voice and signaling tones, the other very common signal transmission on voice circuits is modem signal. Modem signals come in three basic flavors: dial-up data modem signals, fax modem signals, and caller identification signals. Modem signals, especially high-speed data modems and fax modems, cannot be vocoded, especially with the lower bit-rate parametric vocoders.

Real-Time Protocol

The most well known packet format and transport protocol for transmitting real-time data on IP networks is the Real-Time Protocol (RTP). RTP is used to encapsulate voice or other real-time data and provide information about that data such as timestamp, sequencing, and payload format. RTP can be used to encapsulate any one of a number of vocoder formats including G.711 � the native vocoder format of the PSTN.

RTP can also be used to encapsulate and transport DTMF, MF, and call progress tones as well as call supervision signaling (e.g., ABCD CAS bits). RTP tone signaling is utilized when a low bit-rate vocoder is in operation that would otherwise distort the original audio representation of the tone and reduce or eliminate the ability of IVR, voice mail, and auto attendant systems to detect those tones. RTP tone signaling, also called tone relay, is typically used with tone detection and generation capabilities.

ATM Adaptation Layer2
In addition to RTP, another very popular packet format for transmitting real time, compressed voice is the ATM Adaptation Layer 2 (AAL2). AAL2 packets are multiplexed in ATM cells. AAL2 provides many of the same capabilities of RTP and leverages the inherent quality of service (QoS) capabilities of ATM that are particularly useful for voice over packet networking. AAL2 supports a number of vocoder formats in different combinations called �profiles.� It also supports tone relay and fax relay and specific control protocols for loop emulation applications.


Toll Bypass
Shortly after VoIP went through its hobbyist phase, arbitrage specialists realized that the data networks, like the Internet, were not regulated and subsequently tariffed like the PSTN networks. The toll bypass application was created to take advantage of this situation. Toll bypass utilizes private VoIP networks to bypass the PSTN, particularly into certain countries with monopolistic telecom industries. For example, a typical toll bypass service provider would set up media gateways in the United States connected to media gateways in other countries by a private packet network.

Subscribers to the service would call one of the media gateways through an IVR system. The gateway would initiate packet communication with the closest destination gateway, which would dial locally to the destination phone number. Because the toll bypass application was a niche market it could tolerate a voice over packet processing solution with low port density, high cost, and lack of true PSTN feature support.

CLASS 4/Tandem Switch Replacement
The concept of toll bypass has since been adopted by long distance phone and network providers under the category of the CLASS 4/tandem switch replacement application. This application utilizes a voice over packet network and media gateways to replace traditional inter-machine trunks (IMTs) and the CLASS 4 and tandem switches that connect calls between local CLASS 5 switches. As a more mainstream application, port density is a much more important issue, as is voice quality and support for in-band signaling methods such as MF and the relay of those signals through RTP. To maintain an acceptable QoS level and to be able to adhere to service level agreements (SLAs), these service providers utilize ATM backbone networks for their voice over packet application. Therefore, the gateways require a voice over packet processing solution that provides conversion between TDM and RTP/UDP/IP running over ATM.

Loop Emulation Service
Up to this point, the voice over packet applications that have been discussed in this article implement trunking or �back haul� capabilities. However with the further deregulation of the telecom industry, the prospect of providing local phone service became attractive to broadband service providers. With broadband connections, service providers could layer on voice capabilities and become CLECs. Thus the voice over broadband gateway application came into existence. The first generation of voice over broadband gateways are based on the digital loop carrier (DLC) or remote terminal (RT) model and utilize the same interfaces to the CLASS 5 switch. These gateways support multiple integrated access devices (IADs), which are located at the customer premise and support the convergence of voice, data, and video on the same DSL or cable broadband connection. For a voice over DSL broadband gateway, the voice over packet processing solution must support processing between an interface to the CLASS 5 switch and the ATM network.


As a local loop emulation application, the voice over packet processing solution is often required to support functions such DTMF tone detection, programmable call progress tone generation, caller ID generation, tone relay and call conferencing. Many of these functions are required to support local loop features and requirements such as 911, call waiting, and wire tapping. Second generation voice over broadband gateways will replace or supplement the backend connection to a legacy CLASS 5 switch with a connection to a softswitched IP voice network.

This will allow local exchange carriers to eliminate expensive and proprietary legacy switches and replace them with a networked approach to providing phone services. With softswitched architecture providing call control functions and a basis for service creation, carriers open the door to the deployment of innovative new services such as PBX application services.


Port density is a key requirement to reduce the cost of providing voice services to the mass market. Port density based on G.711 vocoding seems to have been adopted by the industry as the defacto metric, however, much like other simple metrics, �your actual density may vary.�

There are two issues to consider when evaluating the port density of a voice over packet processing solution: echo delay path and multiple vocoder usage. From the standpoint of port density and ap-plication complexity, employing G.711 vocoding is extremely desirable. However one of the primary goals of voice over packet processing applications is to stuff as many voice conversations as possible into a given bandwidth. This is achieved through the use of low bit-rate vocoders. In a typical application, network bandwidth utilization is monitored and voice ports are switched from G.711 to low bit-rate vocoders to make room for more calls. As bandwidth becomes available, active ports are switched back to G.711. Thus for most applications, port density based just upon G.711 is not realistic.


The real question becomes how port density is affected as some of the ports are switched to low bit-rate vocoders. The answer lies in how processing resources are allocated for each port. In the traditional voice over packet processing technology architecture, processing resources and memory resources are allocated on a worst-case basis to simplify design. In the case of processing resources allocated to vocoding, the processing resources are typically time-sliced based on the needs of the most compute intensive vocoder configured for a port. This means that a port configured to support either G.711 or G.723.1 will be allocated the processing resources required to run G.723.1 (even if the G.711 vocoder is in operation). In addition, the fixed time slicing arrangement forces the largest packet length sizes for all the vocoders, which has the unpleasant side effect of either incremental jitter or delay.

Next-generation voice over packet processing solutions employ more sophisticated methods for resource allocation, which overcome the limitations of fixed resource allocation and deliver higher port densities across ports running multiple types of vocoders without sacrificing voice quality or network performance.

The voice over packet processing function is a critical component of any network convergence application. Solutions providing this function have evolved from piece-meal algorithms and general purpose DSPs, which sufficed for first-generation network convergence, to highly specialized voice over packet processor solutions that offer many key benefits for equipment manufacturers looking to develop packet-based infrastructure and service creation platforms.

Tim Resker is Senior Product Manager, Voice Technology Group at Performance Technologies. Performance Technologies develops platforms, components, and software solutions for the world�s evolving communications infrastructure. To learn more about the company, visit them online at www.pt.com.


[ Return To The July 2004 Table Of Contents ]

Today @ TMC
Upcoming Events
ITEXPO West 2012
October 2- 5, 2012
The Austin Convention Center
Austin, Texas
The World's Premier Managed Services and Cloud Computing Event
Click for Dates and Locations
Mobility Tech Conference & Expo
October 3- 5, 2012
The Austin Convention Center
Austin, Texas
Cloud Communications Summit
October 3- 5, 2012
The Austin Convention Center
Austin, Texas