Lessons from The Edge: VoIP In The WAN
BY TONY RYBCZYNSKI & MATTHEW F. MICHELS
IP Telephony is an uncontested direction across the industry. What
end users want is relatively well understood: high-quality voice,
ï¿½always-onï¿½ dial tone, easy-to-use features, and new applications.
Technically, this translates into ï¿½five ninesï¿½ reliability, delays of less
than 150 msec, short variable delays or jitter, and zero packet loss.
Meeting these requirements across the WAN is exacerbated by the range of WAN
connectivity options and their price/performance attributes. While multiple
options exist including IP VPNs and optical Ethernet, the workhorses for
inter-site connectivity are leased lines and Frame Relay. So how do we
deliver quality IP telephony over these two technologies?
Bandwidth in the WAN has to be carefully engineered because it costs money
-- how much depends on where the two endpoints are. In the TDM world, every
voice call represents a 64 kbps connection (or 32 kbps with ADPCM). IP
telephony uses G.711 and G.729, whereby voice is coded at 64 and 8 kbps
respectively. But itï¿½s not that simple. To explain this, it is important to
understand the functions and operations of the protocol stack for VoIP. At a
high level, each layer of the protocol stack has specific functions and adds
header and trailer information to each ï¿½Protocol Data Unitï¿½ (PDU) to
accomplish its functions. For VoIP speech path, the relevant protocols
(layers) are, starting from the top, RTP (Real Time Protocol), UDP (User
Datagram Protocol), IP (Internet Protocol), and the Link Layer protocol
(Point-to-Point Protocol or PPP for leased lines, and Frame Relay).
In order to calculate bandwidth demand for a particular link, we need
several pieces of information. We need to know:
S, the voice sampling rate; packets are typically formed every 20
msecs (a configuration option of many coding schemes and a tradeoff between
delay and packet overhead);
P, the number of packets generated per second, typically 50 pps
C, the bit rate of the voice coding scheme (most commonly G.711,
As a result, we can calculate:
V, the voice payload in each packet calculated as C * S or typically
160 and 20 bytes for G.711 and G.729 respectively;
I, the IP/UDP/RTP packet overhead, or 40 bytes, including 20 for IP,
8 for UDP and 12 for RTP (without RTP header compression);
L, the link layer overhead for the specific link type (PPP for leased
lines and Frame Relay).
L, the Link layer overhead is unique to each specific link type.
The responsibility of the Link layer is frame delineation, initiation,
control, multiplexing, and error detection. WAN links, such as PPP and Frame
Relay, are based on HDLC, whereby each frame consists of an opening one-byte
flag, a two-byte HDLC control field, the PPP or Frame Relay header, the
packet payload, a two-byte checksum and a closing flag. The PPP and Frame
Relay headers are four and two bytes respectively.
The bandwidth requirement for each call is P*(V+I+L). For G.729, 8kbps voice
results in a bandwidth need of approximately 27 kbps. Use of speech activity
detection (or equivalently silence suppression) can result in fewer packets
per call, but it would not be wise to engineer less bandwidth for voice on
the basis of this operation for a small number of calls. The bandwidth per
call has to be multiplied by the maximum number of active voice calls on the
WAN link, determined using traditional telephony engineering.
VoIP ON PRIVATE LINES
If a particular link only supported VoIP, engineering the bandwidth of the
link would be straightforward and QoS would not add any value. But network
convergence is all about bringing together voice and data on one network.
Controlling delays and ensuring zero packet loss are key requirements.
The primary purposes for QoS are to:
ï¿½ Minimize end-to-end delay through the network;
ï¿½ Minimize the variability in end-to-end delay (jitter);
ï¿½ Prevent packet loss.
QoS mechanisms, such as IP Differentiated Services (DiffServ), can work
effectively over a properly engineered point-to-point leased lines. Proper
engineering must address the ï¿½large packets on a slow linkï¿½ problem. Letï¿½s
explain. The typical default maximum frame size on a PPP link is 1,500
bytes. Every frame sent down a wire consumes a finite amount of time; this
is called serialization delay and is a function of the number of bytes in
the frame, and the speed of the link.
QoS mechanisms are not pre-emptive. Therefore, a VoIP packet must wait for
the complete transmission of a data packet that started to be transmitted
before the arrival of the VoIP packet. If the data frame is relatively small
or the link speed is fast (e.g., above 1 Mbps), this is not a problem.
However, large frames on slow links require large serialization times, which
result in significant delay for VoIP packets -- whether QoS is implemented
or not. This is essentially a jitter problem. The recommended solution for
large frames on slow links is Layer 2 fragmentation. Fragmentation breaks up
the large frames and allows for VoIP frames to be transmitted sooner, and
more consistently. A simple rule of thumb is to select the link fragment
size (in bytes) equal to the numerical value of the link speed (in kbps).
For example select 512 bytes for 512 kbps links. This rule always results in
a maximum serialization delay of 8ms.
There are four sources for packet loss that must be dealt with. Firstly,
packet loss can be due to transmission errors. Link clock slipping, when the
end device isnï¿½t properly synchronized to the network, also causes packet
loss at regularly scheduled intervals. Fortunately, low error rate links are
generally available domestically and over many international links.
Secondly, packets can be lost during link and node failures. The solution
rests in highly reliable switch design and in fully leveraging networking
techniques to minimize the impacts of failures. Thirdly, packets can be lost
due to network congestion, which can result if unplanned traffic patterns
are generated by users and applications. Lastly, VoIP packets that arrive at
their destination too late to be useful, are simply thrown away -- this is
an underflow condition, and is sometimes referred to as ï¿½jitter buffer
packet loss.ï¿½ QoS is a solution to these last two areas. In addition, the
specific implementation of the receiverï¿½s jitter buffer has a strong bearing
on whether a particular packet is usable or ï¿½just too late.ï¿½ From the end
userï¿½s perspective, there is no functional difference between packets lost
in the network, and jitter overflow. The difference is important, however,
when trying to troubleshoot the network and fix the problem. In any case,
packet loss, unless very infrequent, is a VoIP killing impairment.
VoIP OVER FRAME RELAY
Since Frame Relay is seldom used as purely a point-to-point link (hub and
spoke configurations are much more common), and since Frame Relay QoS
services have limited deployment, running VoIP over Frame Relay is somewhat
problematic. The ï¿½large packets on a slow linkï¿½ issue can only be addressed
by constraining the frame size on particular virtual circuits on an end-to
end basis. The multiplexed nature of Frame Relay and the limited
availability of QoS means that a remote site with large data traffic can
take bandwidth away from, and thus impact, VoIP traffic from another remote
site -- the only common factor is the shared access link at the central site
location. QoS can be used by the transmitting devices at the edge of the
network with positive results, but this does not address QoS prioritization
from the network towards the user, nor across the network.
Frame Relay services are based on assigning Committed Information Rates (CIRs)
to each virtual circuit. The CIR is a guaranteed throughput, any packet
above which can be discarded by the network. The principle is that the
enterprise can implement traffic shaping (by delaying non-voice traffic) to
smooth the transmitted traffic within the CIR -- but this is rarely done.
Enterprises have learned to use Frame Relay very effectively for data, and
often subscribe to zero CIR, on the basis that the service provider
over-engineers the backbone network. Even when CIR is used, a common
practice is to over-subscribe (i.e., under-engineer) access links at the
central site. This is driven by the desire to save money, since CIR is a
tariffed item. The laws of large numbers and temporal spacing of demand
often make this a safe and wise game to play -- for data. The above Frame
Relay realities can be a harsh environment for VoIP, because the voice
frames have no way of being protected. But all is not lost.
The ideal method of supporting VoIP over Frame Relay is to overlay a
separate set of virtual circuits for VoIP, with CIR set for each to support
the voice traffic, and QoS (if available) enabled on a per virtual circuit
basis. The enterprise would turn on traffic shaping, giving VoIP priority
over data traffic. Local packet fragmentation (now defined but not yet
widely available) could be used. An even more sophisticated bandwidth
management approach uses traffic pacing, whereby the transmitter spreads out
the transmission of data over a longer, measured time interval to prevent
data traffic from bunching up at the remote slower speed location.
So there are some viable approaches for running VoIP over Frame Relay. And
the market continues to evolve. Service providers such as Sprint are
offering IP-enabled Frame Relay services, which replace a partial mesh of
virtual circuits by a single virtual circuit per site into a QoS-enabled
IP-routed cloud. Such a service would provide an interesting, likely more
cost effective, converged network solution.
The bottom-line on running Frame Relay is to engineer the network carefully
leveraging access link speeds, CIR, traffic shaping and pacing, and
IP-enabled Frame Relay services, QoS, and fragmentation, where available.
Enterprises, deploying IP telephony for business advantage, need IP
telephony-grade infrastructures to deliver the expected voice quality and
reliability. In WAN environments, this means engineering the wide area
physical and virtual connections to provide the basic bandwidth required for
voice and critical data traffic, and to map QoS deployed across the LAN into
WAN capabilities. In this way, enterprises can leverage their leased line
and Frame Relay WANs to gain the advantage of converged networking. They can
also leverage IP VPNs for telecommuters and road warriors with broadband
access. In the longer term, extending their campus networks to metro sites
through Optical Ethernet services is a very attractive opportunity.
Tony Rybczynski is Director of Strategic Enterprise Technologies in Nortel
Networks. He has over 30 years experience in the application of packet
network technology. Matthew F. Michels is a Senior Consulting Engineer in
Nortel Networks Succession Network Design and Systems Engineering group. For
more information visit
To The October 2003 Table Of Contents ]