March 2008 | Volume 11/ Number 3
Voice Quality Monitoring — Supercharged Quality of Service
By: Todd Lattanzi, ADTRAN
As voice and data networks converge into a single communications network, Quality of Service (QoS) is of growing importance. QoS allows a network to provide better service to selected traffic and enables the network to handle both mission-critical and best-effort traffic on the same infrastructure. But what happens when there are problems with voice quality on this new converged network?
Today, when an end user complains of voice quality problems, an administrator has to check a number of areas like QoS configurations, interface errors and network utilization to find the cause. For calls across a Wide Area Network (WAN) connection between sites, there is no way to know if the problem is on the remote Local Area Network (LAN), the local LAN, or the WAN in between.
The solution lies in the ability to more closely examine the problem and determine where in the data stream the problem is occurring, why it is occurring and what can be done to correct it. QoS alone is not the answer. The answer lies in Voice Quality Monitoring (VQM).
VQM moves beyond QoS and can save time in the troubleshooting process for network administrators, value-added resellers (VARs), managed service providers and carriers. With VQM, the time involved in troubleshooting the problem, determining the cause and working toward a resolution is significantly reduced. The result is a rapid return to a high-quality service experience for the VoIP user.
In essence, VQM allows the network administrator to examine the full data stream and identify problem areas down to the packet level in an easy-to-use graphical interface. To better understand how VQM works and the benefits it provides, let’s examine some common problems encountered on a VoIP network.
The first parameter often examined when there is less than desirable performance on a VoIP network is the Mean Opinion Score or MOS. The MOS provides a numerical measure of the quality of human speech at the destination end of the circuit. The scheme uses subjective tests (opinionated scores) that are mathematically averaged to obtain a quantitative indicator of the system performance. To determine MOS, a number of listeners rate the quality of test sentences read aloud over the communications circuit by male and female speakers. A listener gives each sentence a rating as follows: (1) bad; (2) poor; (3) fair; (4) good; (5) excellent. The MOS is the arithmetic mean of all the individual scores and can range from one to five. Acceptable MOS scores are four and above. When an unacceptable MOS is noted, the cause must be determined because the call quality is illegible.
In addition to MOS, there are three primary areas that need to be examined to properly determine the cause of the unacceptable voice quality. These are delay, jitter, and packet loss.
Delay, or simply stated, the amount of time between reception of packets, is critical to voice quality. So much so that the ITU G.114 specification outlines a one-way delay of no more than 150 milliseconds, in order to meet the measure of a “high-quality” voice experience. It is this value that the packet network must be engineered to meet, at worst, and preferably beat it by a good margin.
Next is jitter. A VoIP conversation is made up of many thousands of packets. IP networks with all the other applications flowing over it will inherently affect individual packets of a conversation differently, resulting in variations in the total delay for a single packet. That variation is called jitter. Too much jitter is a bad thing. There are some mechanisms on VoIP devices to compensate for a small amount of jitter, called jitter buffers, but if the variations are too extreme, even the best jitter buffers will not be able to overcome the variation.
Packet loss can be a major issue in VoIP networks. The amount of loss experienced by a packet flow is affected by buffer exhaustion and intentional packet drops.
Buffer exhaustion is a result of congestion caused by oversubscription or rate decoupling. For example if there are too many Gigabit feeder connections to a single Gigabit uplink the result is oversubscription. The switch device only has a limited amount of buffer space to hold excess packets while it tries to transmit on the single uplink. Under nominal operating conditions, these links are typically not a problem. But, in a high traffic situation, it is easy to see where some traffic must be dropped.
Rate-decoupling is a common occurrence. In this instance, high-speed LAN meets a much lower speed WAN connection. For instance, the Gigabit link from a router is feeding traffic to a router that has only a 45 megabit connection to the WAN, which is less than one-third of the LAN’s bandwidth. It is easy to see how the router would run out of buffer space.
Another parameter that is sometimes examined is out-of-order packets. Packets being sent out of order are not a problem. However, problems arise when there are issues on the receiving end of the stream and packets are not reassembled in the correct order.
The examination of these parameters is not new. In fact, there are standalone hardware and software packages that can be used to derive this data. However, this is typically presented in a rudimentary spreadsheet table requiring a great deal of work to make it easily understandable. Historically, standalone systems that provide the data in a manner that is easier to use have carried a hefty price tag. Recently, some equipment manufacturers have begun offering VQM as a feature of their operating systems. While this is a relatively new development, it has received a warm welcome from network administrators. The VQM technology in these systems provide a user-friendly, easy-to-understand graphical interface that allows administrators to quickly identify problems upstream, downstream and even within an individual unit.
At view, VQM provides a look at all IP interfaces and details in a chart format the number of calls that were rated good, fair or poor quality. It also provides the ability to look at both past and active calls. Most users don’t know what kind of quality problem they’re experiencing during a bad call — they just know it sounds bad. VQM allows network administrators to quickly see the type of problem and investigate the most likely sources of trouble accordingly.
Let’s Walk Through Some Real World Scenarios:
First consider a scenario where all users at a site complain about poor audio on all calls. Using VQM, the network administrator determines that there is a high level of packet loss in the network. Packet loss is often accompanied by interface errors — these errors will explain what is wrong. In our example, the network administrator sees that all of the switch ports connected to phones have detected alignment and CRC errors. These errors generally result from duplex mismatches. Looking deeper, the network administrator sees that the switch ports are statically configured for 100Mbps / Full Duplex operation. The phones, however, have been configured for auto-negotiation. Because the switch is statically configured, and the phones are auto-negotiating, there is a duplex mismatch, leading to interface errors. These errors cause packet loss; and lost packets cause poor audio.
In Figure 1, the network administrator watches an active call and sees that there are short cycles of packet loss on the audio stream coming from extension 3015.This likely means there are errors on a network interface somewhere between the phone and the router.
Looking at other past calls, the administrator sees that this is not the first call to experience packet loss. The problem exists somewhere off of Ethernet 0/2. (See Figure 2.)
When the network administrator looks at the switch port connected to the extension 3015, he notices some errors. The key errors are alignment errors and CRC errors. (See Figure 3.) These problems suggest a duplex mismatch — in particular, they suggest that the local link is configured for full duplex operation (as also indicated in the stats), but the other side is running half duplex. The problem is that the switch has been configured for static speed and duplex settings, whereas the phone is auto-negotiating speed and duplex.
When examining historical calls, VQM displays the number of Real-time Transport Protocol (RTP) flows per interface and provides statistics such as MOS, jitter, delay and loss. It allows the network administrator to search by a number of different criteria including extension, user and time of day. Each call is reflected as a data point. Both inbound and outbound calls can be viewed. Once any data point is selected, a wealth of data is displayed in an easy-to-ready graphical manner. Much of this data is the same as that you would derive from the Command Line Interface (CLI), but it is easily accessible and presented in a much more user-friendly manner.
Active calls are polled every seven seconds providing near real-time data when actively troubleshooting calls. The administrator can actually watch calls as they occur. The same type of data is available on active calls as past calls.
Voice Quality Monitoring provides the ability to look at each source IP and view a range of MOS scores, RTP flows, jitter, loss and out of order packets. Thus, VQM allows the network administrator to have a full view of traffic as it transverses the network. This is a great benefit, especially when there are multiple devices in the network that could be the origin of the problem. VQM can greatly reduce the time needed to troubleshoot and rectify the problem.
For our second scenario, consider a company that has two offices, connected via a point-to-point T1. There is an IP PBX at the larger of the two offices (Office A), which facilitates call setup and tear-down for all VoIP calls, but audio between IP phones goes directly between phones. The IP PBX also has trunks for all calls to and from the PSTN, so it terminates the audio for external calls. Users at the smaller office (Office B) complain about intermittent voice quality problems for external calls. Despite the fact that inbound audio sounds bad, users say that the people at the other end of their calls never complain about the quality of the audio they hear. Users at Office A do not experience poor audio on any calls. Furthermore, calls between the two offices never have poor audio.
Using VQM, the network administrator begins investigating audio streams from the IP PBX at Office A to phones at Office B. The network administrator finds no problems on the router at Office A, but notices that there is considerable jitter and a small level of packet loss on the router at Office B. Because these symptoms suggest a possible problem with QoS, the network administrator confirms the QoS configuration on the router at Office A. It is (correctly) configured to give priority to traffic with a DSCP value of 46, which is the same value used by phones in the company. However, additional VQM details show the administrator that call streams coming from the PBX are tagged with IP Precedence of 5 (DSCP 40), instead of 46. So, this traffic is not given the appropriate bandwidth guarantees. By updating the QoS configuration at Office A to match either DSCP or IP Precedence 5, the network administrator is able to resolve the external call quality problems.
From these examples, you can see the complexity in troubleshooting VoIP issues and how an easy-to-use graphical implementation of VQM reduces downtime and supercharges QoS. A graphical VQM not only reduces both the time and pain associated with troubleshooting network issues, but also provides a wealth of information allowing network administrators to fine-tune their networks to create an even higher-quality end-user experience.
Todd Lattanzi is a Senior Product Manager for the Enterprise Networks Division of ADTRAN, Inc. For more information, visit the company online at http://www.adtran.com.
Today @ TMC
ITEXPO West 2012
October 2- 5, 2012
The Austin Convention Center
The World's Premier Managed Services and Cloud Computing Event
Click for Dates and Locations
Mobility Tech Conference & Expo
October 3- 5, 2012
The Austin Convention Center
Cloud Communications Summit
October 3- 5, 2012
The Austin Convention Center
Jive Communications by LogMeIn has Signed on as a Platinum Sponsor for 2019 Collocated Events, ITEXPO and MSP Expo