People Don't Speak In Tones
BY ARTHUR E. MICHAUD
There is no question that transmitting voice over IP-based data networks will be an
enormously popular application the cost savings are too compelling and compression
technology is getting too good. But there remain significant questions about the viability
of having voice carried over data networks which must contend with data and even video
traffic. The protocols that define a data network were designed for non-real-time data
traffic when networks get congested, routers drop packets; when packets are
dropped, retransmissions may be requested. These behaviors are not acceptable for voice
traffic. If voice packets are dropped or delayed, the callers experience disorienting
repetitions or gaps. End users expect the same voice quality of the public switched
telephone network (PSTN), and will not tolerate this type of poor performance from a
voice-over-IP (VoIP) network.
For carriers and corporations that are interested in deploying VoIP technology, the
success or failure of the venture will depend, in large measure, on the performance of the
network elements that carry and route the voice packets. Gateways are required to perform
the conversion of voice to IP packets for applications that cross between the PSTN and
VoIP networks. In addition to the concern about network element performance, these
gateways must process voice reliably under extreme loads.
A METHODOLOGY IS NEEDED
Having a reliable testing methodology is particularly important with a technology
as new and formative as VoIP. The H.323 umbrella standard, which covers Internet
telephony, only describes the protocols, or handshaking, between voice and data, and the
data packet formats (RTP/RTCP) but not the quality of the transmission.
Some test equipment manufacturers have proposed using tones for stress or load testing
VoIP networks. Tones are certainly a component of stress testing, but they primarily test
the continuity of connections. Tones are also useful for testing latency. But a viable
VoIP testing methodology should have three dimensions:
- It must exercise, or stress, all the components of the VoIP gateway, including the codec
and other system components.
- It must provide the basis for audio quality testing as well as stress testing.
- It must conform to the quality standards, such as Mean Opinion Scores (MOS), used for
voice carried by the PSTN.
Item 1 is required to rigorously test the gateway under full load, to see if
voice connections are degraded or lost when traffic over the IP network increases. Another
consideration involves fully stressing the base signal converter algorithms of the codec
and DSP, plus other critical applications, such as voice activity detection (VAD).
Item 2 is required because, ultimately, quality testing is critical to the
success of a VoIP application. However, you cant do reliable quality testing unless
you can accurately load and stress the system i.e., accurate quality testing is
dependent on accurate stress testing.
Item 3 is required because today, VoIP, except in pure intranet scenarios,
does not replace the PSTN network.
Rather, VoIP is really a convergence technology that represents the integration of two
networks the PSTN and the data network. Because the voice traffic carried over the
IP network may originate or terminate over a conventional PSTN local loop, the voice
testing methodology should conform to PSTN telephony standards.
Tones, such as those for ringing, busy, dial tone, etc., are used in a telephony
network as information signals for both the network equipment and the user. Cellular and
satellite network testing tends to concentrate on the quality of reproducing tones for two
reasons: First, tones although they do not fully replicate the characteristics of
human speech are quantifiable and well-defined, which makes tone testing attractive
as the first tier in a testing methodology.
Second, the wireless carriers were the first to deal with the problems of high
compression over digital networks. So they needed to develop a standard for verifying the
detection and regeneration of tones. To a certain extent, VoIP technology is adopting some
of the methodologies of the wireless world because VoIP has a high likelihood of also
becoming a high-compression application.
To appreciate what it takes to replicate and test a voice processing system, we
can look at a set of ITU-T specifications, P.50 and P.59, which cover the use of
artificial voice. Recommen-dation P.50 defines artificial voice as a signal that is
mathematically defined and that reproduces the time and spectral characteristics of speech
which significantly affect the performances of telecommunication systems. The
recommendation goes on to state that the spectral characteristics of both female and male
speech are required.
ITU-T Recommendation P.59 describes the signal characteristics required to reproduce
the on-off temporal characteristics of human conversation. Simulating the parameters of
human conversation such as pause, double talk, and mutual silence is
required to test speech processing systems which have speech detectors, such as
loudspeakers, telephones, echo control devices, digital circuit multiplication (DCME),
packet systems, and asynchronous transfer mode (ATM) systems.
The artificial speech specification, while it is quite complex, does provide a
consistent source signal for objective quality measurement. ITU-T Recommen-dation P.861
says that artificial voice is useful, but specific to the goals of the testing, or to what
the algorithms of the codec are designed to do. But this specification goes on to
recommend that real voice be used for broader applications.
Voice is a periodic or variable signal that includes short inter-syllabic voices.
A normal telephony call includes non-voice elements, such as conversational pauses, and
non-periodic (static) signals, such as noise or tones. Stress testing a gateway requires
stressing both the voice and non-voice aspects.
To deal with the variable characteristics of voice, traditional speech coders, such as
CELP, RELP, VSELP, and LPC-10, use linear prediction algorithms. These algorithms predict
the signal between frequency samples so that the result is a smooth transition between the
sound clips (samples). They also have to be able to handle a wide variety of voice types
men, women, and children as well as variables like pitch, pauses, intensity,
loudness, accents, etc.
Testing with tones only, or even a series of tones (or sometimes referred to as
pseudo-voice tones), only exercises the non-voiced aspects of speech coder architecture
and not the voice or response to variable sounds that is the function of the whole
gateway. In order to exercise the complete gateway, including the VAD applications, you
need to use real speech as well as tones. For example, some echo cancellation algorithm
schemes are designed to set parameters from a baseline measurement of the initial signal
received in a call. The quality of the call will suffer if that initial signal, assumed as
voice, is actually a test tone. Also, when tones are put through the system to test the
codecs, the echo cancellation algorithms can mistake the tones for non-voiced components
and ignore them, thereby compromising test results.
This is why ITU-T Recommend-ation P.59, referenced earlier in this paper, defined the
signal characteristics required to reproduce the on-off temporal characteristics of
human conversation in order to reliably test speech processing systems which
have speech detectors such as [echo control devices].
THE RIGHT METHODOLOGY
As stated before, a viable VoIP testing methodology must exercise, or stress, all the
components of the VoIP gateway. It must provide the basis for audio quality testing as
well as stress testing, and it must conform to the testing standards used for voice
carried by the PSTN. Therefore, a viable VoIP testing methodology would be a hierarchy of
standards-based (to conform to PSTN telephony standards) testing tools, as follows:
- Tone testing to fully load and stress the codec for continuity and latency of voice
- Artificial or real speech to fully load and stress the VAD aspects of the system under
test, so as to assess the effect on speech by echo cancellation, latency, background
This two-tier testing methodology will ensure that carriers and corporations can deploy
their VoIP systems with confidence.
Arthur E. Michaud is director of product marketing for Hammer Technologies. Hammer
provides a complete solution for load, feature, regression, and in-service testing of
integrated telecommunications systems and services. Hammer is in wide use today by
developers of computer telephony, advanced switching, call centers, and enhanced services
systems. For more information on Hammer and the Hammer product family, visit www.hammer.com