Although data traffic volumes worldwide have exceeded
voice traffic volumes, voice services continue to bring
in several times more revenue than data. Clearly the
market value of voice, combined with the low cost of
transporting data, makes a compelling business case for
converged networks.
The convergence of voice and data on the same packet
network can enable network operators to realize
significant operational and capital cost savings. But
enabling voice and data to co-exist on the same network,
and to realize the related cost savings, requires a
careful balance between network bandwidth-efficiency and
voice service quality. Today, over $600 billion a year
is paid for the predictable quality of the PSTN and
circuit-switched networks. VoIP service providers that
cannot deliver PSTN performance will have a difficult
time winning market share.
Delivering PSTN performance on a VoIP network is not
a simple task, as many network operators have come to
discover. Testing the performance of VoIP networks is a
continuous activity, from initial lab trials to an
in-service production network. If designed and
implemented properly, a testing program can be used to
drive network designs and configurations, and can result
in a network that is optimized for both voice service
quality and network bandwidth efficiency.
WHAT TO TEST FOR
Testing network performance begins with identifying
parameters to be measured and performance objectives.
One approach is to identify which parameters impact a
customer, and then identify the underlying factors that
contribute to those parameters.
Performance testing on the PSTN has traditionally
focused on call completion metrics (such as
answer/seizure ratios) and call duration. This is
understandable since these metrics represent how much
network usage can be billed, and where call quality was
not an issue. However, call quality is an important
issue
for IP telephony, and can directly impact call duration.
This drives the need for testing call quality.
The ITU, in recommendations P.800 and P.830, defines
subjective testing for "listening quality" and "conversational
quality." Listening quality is a one-way phenomenon and
is affected by the clearness, or clarity, and the
loudness of the speaker's voice as it is perceived by a
listener.
Conversational quality is a two-way phenomenon and is
affected by voice delay and echo, in addition to clarity
and loudness. It thus becomes apparent that one must
first test these customer-impacting parameters.
To understand how a network's performance contributes
to these parameters, one must also test the underlying
factors. Underlying factors that impact voice clarity
include encoding and compression, time clipping from a
voice activity detector, concealed and unconcealed
packet loss, and excessive packet jitter (resulting in
dropped packets). PSTN impairments, such as PCM
quantization distortion, are already inherent in the
de-facto standard of "PSTN quality."
Factors that contribute to delay include all
processing delay inherent with packet capture, routing
and queuing; transmission delay; voice
encoding/decoding; and jitter buffering. Factors that
contribute to the effect of echo are an echo signal's
loudness and delay.
While it is important to determine the value of each
of these underlying factors, it is difficult to
determine the customer impact with just a measurement of
an underlying factor. This underscores the need to test
customer performance parameters, such as clarity, in
addition to just packet loss. For example, a two percent
packet loss may or may not be perceived by a customer.
THE BASICS
A basic testing program can begin with five parts:
benchmarking, baselining, detecting impairments,
troubleshooting, and optimizing.
Benchmarking
According to Webster a benchmark is "a standard or point
of reference in judging or measuring quality."
Benchmarking in our industry involves the determination
of performance objectives. This is needed to avoid
costly overprovisioning of a network or
underprovisioning of a network. A network should be
designed to meet objectives.
Benchmarks may be set by the limits of customer
acceptance as determined by subjective testing, by
comparison with the PSTN or other standard-bearer
network, or by established industry standards.
Subjective testing is very difficult and expensive, so
one should turn to previous work in setting benchmarks.
For one-way delay in networks with echo adequately
controlled, ITU G.114 recommends a maximum one-way
transmission time of 150 milliseconds for most voice
applications. In addition, a maximum of 50 of these 150
milliseconds is allocated for processing time. This is
due to the transmission time that may be needed on
international connections.
G.114 provides a good objective for a VoIP network.
However, an alternative may be to measure delay on the
PSTN for a call with the same endpoints as a VoIP call,
and to meet that objective with an additional time
allocated for VoIP processing (e.g., PSTN delay plus 50
milliseconds).
For echo, ITU G.131 provides valuable information on
the effects of echo's loudness and delay on user
acceptance. Included in G.131 is a graph showing the
range of user acceptability as a function of echo
loudness and one-way transmission time. This can provide
benchmarks for echo loudness and delay.
While there are standards for how to measure voice
clarity (or "speech quality"), there is no established
standard for what values need to be met. The most widely
recognized scale for speech quality is the Mean Opinion
Score (MOS) listening quality scale (1 = bad to 5 =
excellent). PSQM + scores are also used, and are on a
scale of 0 = perfect to 6 and higher = bad. One method
for setting clarity benchmarks is to measure the speech
quality of the PSTN. PSTN calls will typically fall
above 3.5 on a MOS scale and below 2.5 on a PSQM+ scale.
(For a more in-depth look, make sure to check out the
sidebar article Methods For Measuring
Speech Quality.)
Baselining
Baselining is the process of determining the nominal
performance of a network. This is different than
benchmarking. Benchmarking determines the performance
objectives of a network; baselining determines how a
network actually performs under operating conditions.
For example, a benchmark for delay may be 150
milliseconds, but a network's baseline performance may
consistently deliver 100 milliseconds of delay. This is
important for detecting impairments. Delay measured at
145 milliseconds may fall within the benchmark bounds,
but because it is more than 100 milliseconds, it may
indicate an impairment (e.g., network congestion or a
jitter buffer that is set too high).
Baselining is useful for setting thresholds to be
used to detect impairments, for optimizing a network,
for declaring performance standards for customers, and
for establishing service level agreements (SLAs).
Baselining is also useful in performing network VoIP
readiness assessments to determine if an IP network is
properly configured and provisioned to carry voice
traffic.
Detecting Impairments
Once a network is operational, then it must be monitored
to detect impairments. This requires thresholds, which
can be determined from baselining and/or benchmarking.
Thresholds may be dependent on network segments. For
example, delay thresholds for impairment detection may
be different depending on the call endpoints. However,
it is important to remember that to a customer, a long
call path does not provide justification for
unacceptable delay.
Detecting impairments can be done with active testing
and with passive testing, and using both is recommended.
Passive testing is an inexpensive way to monitor certain
underlying factors of network performance, such as
packet loss and jitter. However, some parameters, like
voice delay, can only be performed with active testing.
Clarity can only be determined with active testing, but
it can be predicted or estimated with passive testing
using sophisticated software. Techniques for predicting
MOS or other clarity scores from passively obtained
metrics should be the next area of focus for industry
measurement standards.
Detecting impairments should begin with measuring
customer-impacting parameters, including clarity, delay,
and echo. If a measurement of an underlying factor, such
as packet loss or jitter, exceeds a threshold, then
customer-impacting parameters should be measured to
determine the severity of the impairment.
Troubleshooting
Troubleshooting is perhaps the most important part of
network testing. It requires the most sophisticated
tools and expertise, and efficient troubleshooting is
essential to resolve customer-impacting problems
quickly.
Troubleshooting VoIP network performance requires
isolating an impairment and determining its cause. Voice
quality analyzers are available that can expose many
impairments that impact clarity, delay, and echo. For
example, a clarity test may indicate packet loss as a
suspect. One can then utilize a protocol analyzer with
RTP analysis to isolate the network segment on which
loss is occurring. Knowledge of the network is also
critical. For example, packet loss may be a result of
late packets dropped from a jitter buffer; measuring
packet loss on network segments will not expose this,
but measuring packet jitter and knowing the
configuration of the jitter buffers will help.
By testing a network in segments, one can isolate the
source of an impairment to one or a few systems. Then
equipped with the right testing tools and knowledge of
the network, one can quickly determine which underlying
factors are contributing to the impairment.
OPTIMIZING
Optimizing a network for performance is an ongoing
endeavor. There are many configurations and designs in a
VoIP network that can be changed or tweaked to affect
performance. Optimizing requires a carefully controlled
test environment and process. It should not be done on
an in-service network, but it can be done on a
production network that is either partitioned for a test
environment, or that has call resources taken out of
service for this purpose.
FIND THE BALANCE IN YOUR NETWORK
The subject of network optimization is vast and complex.
But in short, it involves careful testing of the impact
of individual resources and configuration changes. For
example, the impact of using a VAD (e.g., G.729b) in
conjunction with a codec (e.g., G.729a) on voice clarity
should be determined with all other network conditions
kept the same.
Optimization also includes determining how much
performance degradation can be accepted. For example,
designing and provisioning a network for 0 percent VoIP
packet loss may prove too costly, when in fact a 0.5
percent random loss can be accepted. In this respect,
optimization means striking the right balance between
quality performance and network utilization.
John Anderson is the IP telephony manger at
Agilent Technologies, Network Systems Test Division.
Agilent is a global technology leader in communications,
electronics, life sciences and healthcare. Agilent's
Network Systems Test Division provides telecom equipment
manufactures, service providers and enterprises with a
suite of network testing WAN ATM, IP, and 3G networks
and products. To find more information about agilent's
IP telephony testing solutions, visit www.agilent.com/comms/voicequality/.
[ Return
To The October 2001 Table Of Contents ]
|