TMCnet News

Adaptive Jitter Buffer Management for Voice over IP
[May 12, 2004]

Adaptive Jitter Buffer Management for Voice over IP

Adaptive Jitter Buffer Management for Voice over IP

This white paper will introduce important concepts surrounding Jitter Buffer Management. First, it will discuss the reasons for a migration towards voice over Internet Protocol (VoIP) and the challenges faced in implementing this service. Then, several industry-wide Jitter Buffer Management techniques will be explained in detail.

Why Service Providers and Enterprises are Migrating to VoIP

In order to effectively compete in today’s telecommunications market, service providers and enterprises are turning towards VoIP technology. VoIP allows service providers to quickly rollout new services and efficiently utilize their existing bandwidth to enhance customer satisfaction, retain customers and increase revenue. VoIP allows enterprises to provide enhanced services to their customers and employees and to better utilize their existing IP network infrastructure.


The tough competition in telecommunications has slashed revenue margins on basic data and voice services. In order to stay profitable, service providers must introduce new revenue-producing services to retain their customers and cut costs simultaneously. Only through integrating their circuit and packet networks can service providers take advantage of the universality of the public-switched telephone network (PSTN) and leverage the inherent flexibility in the open architecture of packet networks. Packet-based network equipment is easier to upgrade and configure, enabling quick time-to-market. For example, unified messaging - where a user accesses voice mail, faxes and e-mail from a phone or personal computer - is only practical using packet-based networks.

Enterprises pioneered the use of VoIP. Initially, their goal was to better utilize internal packet networks and avoid long-distance toll charges. However, these same enterprises are now evaluating services such as unified messaging, extended PBX services to remote offices and Integrated Access Devices (IAD). The efficiency of the Internet has prompted companies to evaluate the various linkages between corporate Web sites and Interactive Voice Response (IVR). To summarize, the key drivers for migrating voice traffic to packet-based switching from circuit-based switching are:

Service providers need to quickly rollout new revenue generating services
• The software-based architecture of packet-based devices is leveraged to quickly develop new services
• The standards-based interfaces of packet-based devices are used to quickly configure or upgrade devices to support new services

Both service providers and enterprises need to effectively manage bandwidth demands and costs
• Packet-based networks handle voice, fax and data more efficiently and require less bandwidth than circuit-based networks
• Packet-based devices are easier to manage and less expensive
• Packet-based devices are standards-based and enable service providers to be vendor independent



Greatest Challenge Implementing VoIP

There are many compelling reasons to migrate voice traffic to VoIP, but certain challenges must first be overcome. Voice traffic is real-time in nature, and circuit-switched networks, like the current PSTN, were specifically designed to handle real-time traffic. With dedicated circuits, callers are ensured that their calls have the dedicated bandwidth necessary to be toll-quality with little delay. However, this system is not efficient for processing data and fax traffic.

Packet-switched networks like IP were designed to handle data traffic. Voice, video and data traffic can all be packetized and sent over an IP network. IP traffic flows are designed to be very fluid, where individual packets can travel along many different routes to reach their final destination. The IP network will re-route traffic flow if an individual node fails, or if a router determines that a bottleneck has developed further downstream.

Dynamically changing packet routing causes a number of problems for real time voice traffic:
Network Delay
Voice quality is affected by the latency or delay of IP networks. If you have ever placed a call over a satellite-linked network, you may have experienced an annoying delay. It may take several seconds to hear what the other person has said. Users of a poorly implemented VoIP solution will hear a similar delay. A typical IP packet will transit through multiple hops or routing devices, with each device imposing a variable amount of delay. The network delay is the total accumulated delay.

The Internet Engineering Task Force (IETF) has attempted to address this issue by utilizing Real Time Protocol (RTP). Network managers can use various bandwidth reservation schemes in conjunction with RTP to minimize the overall end-to-end delay. However, these schemes are based on emerging technology that is not available across all networks.
Packet Sequence
In an IP network, individual packets may take different routes to a specific destination. Since they can arrive at different times, the packets may be out of sequence. In Figure 1, traffic leaving gateway A can enter the IP network through either router 1 or 2 and exit the network through router 3 or 4. At any point in time, one path can be either faster or slower than the other paths. Voice requires that these packets be placed into the correct sequence.

Lost Packets
An individual packet will often be lost within the IP network and never make its way to its final destination. This methodology is acceptable with data, because data can wait. Data relies upon the upper protocols to resend data if the data is dropped. Lost voice packets, however, will translate into breaks in the speech and could result in overall loss of speech intelligibly.
Duplicate Packets
As a packet is making its way through the IP network, it is often sent out along two paths towards its destination. This results in the target node receiving two copies of the same packet marked with the same timestamp.
VoIP Challenges Summary
Packet re-routing is caused by a number of factors including different methods of routing, the latency generated by queuing inside each IP device and different levels of network utilization. These factors can cause great differences in the amount of time required to transmit a packet causing jitter. To illustrate this point, imagine two packets that are like racecars. One racecar leaves the gate one second before the second racecar. If the first car takes another route or drives faster than the second car, it will most likely get to the finish line first, but if does not get to the finish line exactly one second before the other racecar, then the difference between their arrival time is jitter. In an IP network, jitter is defined as the inter-packet variability between the RTP packets.

How Is The Jitter Problem Addressed?

The major concern of both service providers and enterprises when migrating to VoIP is the need to maintain the same voice quality as that offered by their current circuit-switched network. When an RTP voice packet reaches a voice gateway, the preparation and conversion required for transmitting over the PSTN can be broken down into three major steps; storage, sorting and decoding/playing.
Storage
Mitigating the effect of jitter on voice communication is one of the major challenges facing VoIP vendors. Removing jitter requires collecting packets and storing them long enough to allow the slowest packets to arrive in order to be played in the correct sequence. The storage area used by those devices is known as the “Jitter Buffer”. The network device increases the delay as it waits for the slowest packet to arrive.

In order to achieve voice quality, the vendor must balance the need to minimize delay with the need to remove jitter. The bigger the buffer, the more delay, but if the buffer is too small, then voice quality can be compromised. Therefore, the ideal solution would adapt to the characteristics of the network, storing only the required amount of buffered voice traffic. This feature is known as “Jitter Buffer Management.”

Most vendors use one of the following two methods to manage the size of the jitter buffer: Packet time variations in the jitter buffer are measured over a period of time, and the buffer size is incrementally adapted to match the calculated jitter.

The number of packets that arrive too late to be processed are counted and compared to the number of packets that were successfully processed. This ratio is then used to adjust the jitter buffer to target a predetermined allowable late packet ratio.
Sorting
As the data is stored, it must also be sorted into the original sequence to accurately reproduce the original audio. As Figure 2 illustrates, RTP packets can arrive in any sequence, at any time or not at all. The Jitter Buffer Manager sorts the voice frames according to a sequence number supplied in the RTP packet. The manager leaves open slots for those packets that have not yet arrived. The voice sampling size used by the voice coder determines the size of the slots. The Jitter Buffer Manager also determines the average holding time of a packet and thus the jitter buffer size.

Decoding and Playing
Another source of delay is the decoding of voice packets followed by the playout of audio to the receiving party. In order to decode the packets, the device must support whichever voice coder was used to encode the packets. The playout is the actual playing of the voice sample.

Closely related to the Jitter Buffer Manager is “frame erasure,” which is used to compensate when a packet has not arrived in time for playout. In the best case, frame erasure will not be necessary since it can hurt voice quality. The goal of a good Jitter Buffer Manager is to use as little frame erasure as possible.

Most vendors use one of the following schemes for frame erasure:

• The voice packet preceding the missing packet is replayed during the interval when the lost packet would have been played. This method is effective when the packet loss is infrequent.
• Both the nth packet and the nth+1 packet are transmitted every time, which results in every packet being sent twice. This method greatly increases the chance that the data will be received but uses a tremendous amount of bandwidth.

Voice/Data/Fax Network Technologies Difference

Performance Technologies’ patent-pending Adaptive Jitter Buffer Management scheme compensates for the variances in inter-packet delay, while adding as little latency as possible.

The Adaptive Jitter Buffer Management software tracks each packet from initial arrival through final playout, determining the average mean holding time for packets in the jitter buffer. This holding time is used to create the optimal jitter buffer size, which mediates latency while preserving voice quality. Adaptive Jitter Buffer Management is dynamic thereby enabling voice/data/fax network technologies enhanced products to adapt to changes in network delay during peak network hours.

Also using a number of smoothing algorithms to insure that a small number of delayed packets do not skew the average time resulting in buffer hold periods that are too long. Adaptive Jitter Buffer Management utilizes an algorithm that enables the RISC processor to delay until the last possible millisecond before it must notify the DSP that frame erasure is required. By lengthening the period in which late voice frames can still be processed without increasing the buffer size, the Adaptive Jitter Buffer Management maintains high voice quality even on networks with large variations of delay. This ‘just in time processing’ is possible because it divides processing duties between RISC processors and Digital Signal Processors (DSPs).

As the Adaptive Jitter Buffer Management software monitors packet flow, it establishes an overall latency baseline and then attempts to smooth any network traffic variations by inserting or deleting audio data as necessary. If the actual jitter goes significantly above the latency baseline, then the goal is dynamically re-adjusted to adapt to the change. Vendors or operators using voice/data/fax network technologies equipped VoIP devices can set parameters to fine tune the balance between delay and voice quality by setting the “loss tolerance” (the percentage of packets that arrive in time for playout compared to the acceptable number of packets dropped due to late arrival). The dynamic buffer adjusts the balance between speech quality and packet-delivery reliability by increasing the buffer size if it is too short (lots of packets are arriving too late for playout) or decreasing the size if it is too long (playout is delayed too long with delay becoming intolerable).

Service providers are migrating to VoIP in response to today’s highly competitive business environment, in order to quickly introduce new services and efficiently utilize their existing bandwidth. Enterprises are using VoIP to provide better services to their customer and employees, along with better utilization of their existing IP network infrastructure. In order to accomplish these goals, both organizations must choose vendors that effectively address issues like jitter that compromise voice quality.

Developing algorithms that determine the optimal buffer size is complex, and yet without the proper algorithms, toll quality voice over IP cannot be achieved. It is for this reason that vendors have selected VoIP solutions like the Adaptive Jitter Buffer Management. Voice/data/fax network technologies ensure high voice quality even on networks with large amounts of jitter by efficiently removing jitter without adding delay.

For more information, visit Performance Technologies’ Web site at www.pt.com or contact [email protected].

To learn about voice/data/fax network technologies, visit: http://www.pt.com/products/prodgroup_dsp.html

[ Back To TMCnet.com's Homepage ]