TMCnet - World's Largest Communications and Technology Community




FeatureArticle.gif (4903 bytes)
August 1999

People Don't Speak In Tones


There is no question that transmitting voice over IP-based data networks will be an enormously popular application — the cost savings are too compelling and compression technology is getting too good. But there remain significant questions about the viability of having voice carried over data networks which must contend with data and even video traffic. The protocols that define a data network were designed for non-real-time data traffic — when networks get congested, routers drop packets; when packets are dropped, retransmissions may be requested. These behaviors are not acceptable for voice traffic. If voice packets are dropped or delayed, the callers experience disorienting repetitions or gaps. End users expect the same voice quality of the public switched telephone network (PSTN), and will not tolerate this type of poor performance from a voice-over-IP (VoIP) network.

For carriers and corporations that are interested in deploying VoIP technology, the success or failure of the venture will depend, in large measure, on the performance of the network elements that carry and route the voice packets. Gateways are required to perform the conversion of voice to IP packets for applications that cross between the PSTN and VoIP networks. In addition to the concern about network element performance, these gateways must process voice reliably under extreme loads.

Having a reliable testing methodology is particularly important with a technology as new and formative as VoIP. The H.323 umbrella standard, which covers Internet telephony, only describes the protocols, or handshaking, between voice and data, and the data packet formats (RTP/RTCP) — but not the quality of the transmission.

Some test equipment manufacturers have proposed using tones for stress or load testing VoIP networks. Tones are certainly a component of stress testing, but they primarily test the continuity of connections. Tones are also useful for testing latency. But a viable VoIP testing methodology should have three dimensions:

  1. It must exercise, or stress, all the components of the VoIP gateway, including the codec and other system components.
  2. It must provide the basis for audio quality testing as well as stress testing.
  3. It must conform to the quality standards, such as Mean Opinion Scores (MOS), used for voice carried by the PSTN.

Item 1 is required to rigorously test the gateway under full load, to see if voice connections are degraded or lost when traffic over the IP network increases. Another consideration involves fully stressing the base signal converter algorithms of the codec and DSP, plus other critical applications, such as voice activity detection (VAD).

Item 2 is required because, ultimately, quality testing is critical to the success of a VoIP application. However, you can’t do reliable quality testing unless you can accurately load and stress the system — i.e., accurate quality testing is dependent on accurate stress testing.

Item 3 is required because today, VoIP, except in pure intranet scenarios, does not replace the PSTN network.

Rather, VoIP is really a convergence technology that represents the integration of two networks — the PSTN and the data network. Because the voice traffic carried over the IP network may originate or terminate over a conventional PSTN local loop, the voice testing methodology should conform to PSTN telephony standards.

Tones, such as those for ringing, busy, dial tone, etc., are used in a telephony network as information signals for both the network equipment and the user. Cellular and satellite network testing tends to concentrate on the quality of reproducing tones for two reasons: First, tones — although they do not fully replicate the characteristics of human speech — are quantifiable and well-defined, which makes tone testing attractive as the first tier in a testing methodology.

Second, the wireless carriers were the first to deal with the problems of high compression over digital networks. So they needed to develop a standard for verifying the detection and regeneration of tones. To a certain extent, VoIP technology is adopting some of the methodologies of the wireless world because VoIP has a high likelihood of also becoming a high-compression application.

Artificial Speech
To appreciate what it takes to replicate and test a voice processing system, we can look at a set of ITU-T specifications, P.50 and P.59, which cover the use of artificial voice. Recommen-dation P.50 defines artificial voice as “a signal that is mathematically defined and that reproduces the time and spectral characteristics of speech which significantly affect the performances of telecommunication systems.” The recommendation goes on to state that the spectral characteristics of both female and male speech are required.

ITU-T Recommendation P.59 describes the signal characteristics required to reproduce the on-off temporal characteristics of human conversation. Simulating the parameters of human conversation — such as pause, double talk, and mutual silence — is required to test “speech processing systems which have speech detectors, such as loudspeakers, telephones, echo control devices, digital circuit multiplication (DCME), packet systems, and asynchronous transfer mode (ATM) systems.”

The artificial speech specification, while it is quite complex, does provide a consistent source signal for objective quality measurement. ITU-T Recommen-dation P.861 says that artificial voice is useful, but specific to the goals of the testing, or to what the algorithms of the codec are designed to do. But this specification goes on to recommend that real voice be used for broader applications.

Real Speech
Voice is a periodic or variable signal that includes short inter-syllabic voices. A normal telephony call includes non-voice elements, such as conversational pauses, and non-periodic (static) signals, such as noise or tones. Stress testing a gateway requires stressing both the voice and non-voice aspects.

To deal with the variable characteristics of voice, traditional speech coders, such as CELP, RELP, VSELP, and LPC-10, use linear prediction algorithms. These algorithms predict the signal between frequency samples so that the result is a smooth transition between the sound clips (samples). They also have to be able to handle a wide variety of voice types — men, women, and children — as well as variables like pitch, pauses, intensity, loudness, accents, etc.

Testing with tones only, or even a series of tones (or sometimes referred to as pseudo-voice tones), only exercises the non-voiced aspects of speech coder architecture and not the voice or response to variable sounds that is the function of the whole gateway. In order to exercise the complete gateway, including the VAD applications, you need to use real speech as well as tones. For example, some echo cancellation algorithm schemes are designed to set parameters from a baseline measurement of the initial signal received in a call. The quality of the call will suffer if that initial signal, assumed as voice, is actually a test tone. Also, when tones are put through the system to test the codecs, the echo cancellation algorithms can mistake the tones for non-voiced components and ignore them, thereby compromising test results.

This is why ITU-T Recommend-ation P.59, referenced earlier in this paper, defined the signal characteristics required to reproduce the “on-off temporal characteristics of human conversation” in order to reliably test “speech processing systems which have speech detectors such as [echo control devices].”

As stated before, a viable VoIP testing methodology must exercise, or stress, all the components of the VoIP gateway. It must provide the basis for audio quality testing as well as stress testing, and it must conform to the testing standards used for voice carried by the PSTN. Therefore, a viable VoIP testing methodology would be a hierarchy of standards-based (to conform to PSTN telephony standards) testing tools, as follows:

  1. Tone testing to fully load and stress the codec for continuity and latency of voice connections.
  2. Artificial or real speech to fully load and stress the VAD aspects of the system under test, so as to assess the effect on speech by echo cancellation, latency, background noise, etc.

This two-tier testing methodology will ensure that carriers and corporations can deploy their VoIP systems with confidence.

Arthur E. Michaud is director of product marketing for Hammer Technologies. Hammer provides a complete solution for load, feature, regression, and in-service testing of integrated telecommunications systems and services. Hammer is in wide use today by developers of computer telephony, advanced switching, call centers, and enhanced services systems. For more information on Hammer and the Hammer product family, visit www.hammer.com

Technology Marketing Corporation

2 Trap Falls Road Suite 106, Shelton, CT 06484 USA
Ph: +1-203-852-6800, 800-243-6002

General comments: [email protected].
Comments about this site: [email protected].


© 2024 Technology Marketing Corporation. All rights reserved | Privacy Policy