Echo in Voice over IP Systems
TMCnet VoIP Performance Management Columnist
What is Echo?
Echo is an obvious and very annoying problem in telephony systems, and can occur in Voice over IP, Cellular and long distance connections. There are several types of echo:
- Talker Echo occurs when a proportion of the talker’s (i.e. person speaking) voice is reflected back to them. The talker hears a delayed copy of their own voice.
- Listener Echo occurs when a talker’s voice is reflected back to them and then re-reflected towards the listener. The listener hears two or more copies of the talker’s speech. Listener Echo is less common than Talker Echo.
Echo is a common problem in Voice over IP services - not
because VoIP introduces echo but because VoIP increases delay
and makes echo more obvious and annoying.
There are techniques that can be applied to reduce echo problems, such as echo cancellation and echo suppression, however these are not always effective.
This application note describes why echo occurs, what effects it has on voice quality, how echo cancellation and suppression work and how to troubleshoot echo problems.
Sources of Echo
Some “echo” is deliberately introduced in telephone systems. In a typical telephone handset a proportion of the speech energy from the microphone is fed back to the speaker or earpiece. This provides a natural way to control the loudness of the talker - if some speaks very loudly then this would result in a loud signal being fed back to their ear. This deliberate feedback is called “sidetone” - as the signal is fed back instantaneously it does not sound like echo, which by definition is delayed with respect to the original speech.
There are two common causes of echo:
(i) Reflections in 2-4 wire interfaces.
Both Voice over IP and traditional digital telephone systems (PCM, ISDN) are “4-wire” in the sense that the signal in one direction is carried over a separate “pair” to the signal in the other direction. This means that the two signals are independent of each other.
Analog local loops, typically used to connect to individual telephones, are “2-wire” as the signals going in both directions are carried over the same pair of wires. Where a 4-wire digital system connects to a 2-wire analog system it is necessary to perform a 2-4 wire conversion using either a transformer hybrid or active hybrid. This conversion function is typically built into Central Office or PBX line cards or into channel banks.
The 2-4 wire conversion process typically relies on the hybrid being “balanced”, which means that the loading presented by the 2-wire line matches that expected by the hybrid. If there is some mismatch then the transmit and receive signals on the 2-wire line cannot be properly separated and hence an echo occurs.
Echo is a very common problem on PCM-analog loop interconnections however with conventional telephone systems the delay is so short that the echo does not sound like echo (i.e. it sounds like sidetone).
(ii) Acoustic Echo
Acoustic echo occurs when some proportion of the sound coming out of the “speaker” part of a telephone handset or headset can be heard by the microphone part of the handset or headset. This can be due to poor design, or even to the user holding the handset away from their ear.
Echo is typically reported in terms of Echo Return Loss (ERL). This is the ratio between the original signal and the echo level expressed in decibels (dB). A higher ratio corresponds to a smaller echo, hence 55dB would be a low echo level and 15dB quite a high echo level.
Impact of Echo on VoIP Call Quality
Talker echo is the most common type of echo and results in a proportion of the talker’s (person speaking) voice being reflected back to them.
The chart below shows the relationship between delay and conversational quality for two conditions - firstly with a low level of echo (55dB echo return loss) and secondly with moderate level of echo (35dB echo return loss).
If round trip delay is very short, say less than 30mS, then the talker cannot distinguish between the echo and the deliberately introduced sidetone.
If the delay is a little longer, say 50mS, then the talker cannot hear the delayed copy of their speech as a distinct copy however it does impact speech quality, resulting in a sound quality generally described as “hollow”, “cave-like”, “tunnel-like” or similar.
As the delay increases further, the echo becomes more obviously echo - and the combined effect of the loudness of the echo and its delay cause considerable annoyance.
Echo Suppression and Cancellation
Echo Suppression (or NLP)
Low to moderate levels of talker echo cannot be easily heard while the talker is actually speaking but are much more obvious during the gaps in speech (silence periods). An early approach to masking echo problems was to detect when these silence periods occur and to replace the silence with artificial background noise. Echo suppression is often called non-linear processing (NLP).
Echo cancellation is a sophisticated approach to removing the echo that may be present in speech signals. An adaptive signal processing algorithm monitors the speech signals going in each direction and attempts to learn the characteristics of the echo - i.e. if an echo is present then what is the associated delay and amplitude. Once the characteristics of the echo have been learnt then the echo cancellation algorithm can attempt to remove the echo. The adaptation process is temporarily suspended during doubletalk, i.e. when both users are speaking simultaneously.
In order for the echo canceller to be able to operate, it has to keep some history of the sampled speech signal. This history (echo canceller tail length) uses significant memory, usually a scarce resource in the digital signal processing (DSP) chips used in VoIP systems. If the echo delay is greater than the history kept by the echo canceller then the echo canceller will be unable to cancel the echo.
The time taken for the echo canceller to learn the characteristics of the echo is called the convergence time. Sometimes a severe echo can be heard for a few seconds at the start of a call - this is often due to the time taken for the echo canceller to converge and cancel the echo and is therefore called convergence echo.
Implementation of Echo Cancellation and Suppression
Echo cancellation and echo suppression are usually implemented together, and are able to reduce quite significant levels of echo.
The Echo Return Loss Enhancement (ERLE) represents the improvement in echo level introduced by the echo canceller. For example:
Echo Return Loss (ERL) 25dB
Echo Return Loss Enhancement (ERLE) 30dB
Residual Echo Return Loss = ERL+ERLE 55dB
The Echo Canceller may improve echo levels by 25-35dB and the addition of Echo Suppression (NLP) can further improve this.
Note that echo levels can potentially reach 0dB and even an echo canceller can only improve the echo level to maybe 30dB which may still be audible.
Echo cancellers are commonly implemented in VoIP gateways and typically are configured to cancel echoes from the “trunk” side of the gateway (i.e. the non-VoIP side). Echo cancellers may be also used in IP phones to control acoustic echo from the handset; this is common in full-duplex speakerphones however not always implemented in IP handsets or softphones.
Echo Measurement and the VoIP Performance Management Framework
Echo may be measured using specialized test tools that analyze audio signals, or may be estimated by the echo cancellers typically integrated into VoIP gateways. Specialized test tools are obviously able to make more accurate measurements however would typically be practical for troubleshooting once a problem has been identified. Due to the nature of echo, problems can occur on an apparently ad hoc basis and hence it is desirable to detect echo problems on live calls and hence collect data for post-analysis.
The emerging protocols that fit within the VoIP Performance Management Framework are able to support the detection and reporting of echo problems affecting live calls. A key element in this process is RTCP XR .
The RTCP XR VoIP metrics report is exchanged between VoIP endpoints during a call. This report contains the estimated Residual Echo Return Loss (RERL) after the effects of echo cancellation, and the network round trip delay. The RERL value is an estimate made by the echo canceller and may not be as accurate as that made using specialized test tools however this approach allows the estimated echo level on every call to be reported.
For example, an IP phone is connected to a remote trunking gateway, and the gateway connects to the traditional telephone network. Say that some echo is occurring on the telephone network and the IP phone user is experiencing talker echo. The echo canceller in the trunking gateway will attempt to cancel the echo (or at least reduce the level of the echo) from the telephone network.
RTCP XR VoIP metrics reports from the trunking gateway to the IP phone will report the estimated residual echo level. This allows the IP phone to incorporate the estimated echo level into its calculations of call quality. The IP phone may report call quality using SIP, and hence the conversational call quality metric reported by the phone using SIP would incorporate the estimated echo level reported by the trunking gateway.
Diagnosing Echo Problems
The key to diagnosing echo problems is to realize that the echo is originating at the remote end of the telephone connection to the person complaining about echo.
Unfortunately in moving to VoIP the round trip delay is often significantly increased - this makes any existing echo problems much more annoying and obvious. VoIP did not introduce the echo, it simply made it much more annoying!!
Echo Source Identification - circuit problems
When connecting VoIP calls to analog local loops attached to a PBX or CO or to analog phones attached to ports on a VoIP gateway then it may be possible to identify specific ports that are causing echo problems. The simplest approach is to analyze reports of echo problems to see if there is some commonality in the remote line - i.e. if users A, B, C and D are reporting hearing echo problems then was this occurring when they were making calls to user E? The VoIP performance management framework can be a very useful aid in this type of diagnosis.
Echo Source Identification - acoustic echo
Acoustic echo is often due to cheap handsets or headsets, or to users who habitually hold their phone handset away from their ears. If the problem can be localized to a particular phone, headset or user then the best solution may be to replace the phone and potentially use acoustic echo cancellation software in the phone (available in some phones/ software builds). If echo problems occur with IP phone to IP phone calls then immediately suspect acoustic feedback.
Uncancelled echo level may be very high
An echo canceller typically improves echo by 30dB however if the echo level is severe, say 0dB, then there may still be some audible echo. This may also result in convergence echo, or a short period of very loud echo at the start of the call whilst the echo canceller is converging.
No echo canceller
Some VoIP systems do not contain echo cancellers (e.g. an IP phone) and those that do may have the echo canceller oriented in a different direction to the echo. For example, say that the user of an IP phone has a tendency to loosely rest the handset on their shoulder rather than pressing it to their ear; this may allow some acoustic feedback from the handset speaker to the microphone which may result in echo being heard by the remote user. The call may be going through a VoIP gateway that contains an echo canceller however the echo canceller would typically be oriented to cancel echoes from the PCM/ Non-VoIP side and not from the IP side, and hence would have no effect.
Echo longer than tail
Echo cancellers are configured with some maximum echo path delay or echo tail length. It is expensive to make this tail length long as it consumes expensive DSP memory and hence the tail length is often set to some compromise value. If the actual echo path delay is too long then the echo canceller will be unable to cancel the echo.
This problem may occur when a VoIP call is connected to a PCM trunk that is either connected to another VoIP network, a cellular network or some other network with delay (e.g. satellite).
Echo path distortion
Sometimes the path taken by the echo may introduce significant non-linearities which greatly reduce the effectiveness of the echo canceller (which generally assumes that the echoed signal has a linear relationship with the original signal.
Signal level too high or low
If the signal level is too high or too low then the echo canceller may not operate correctly. Low signal levels can also cause echo suppressors (NLPs) to suppress quieter parts of speech, leading to gaps in speech.
Some echo cancellers can have some difficulty dealing with calls that have excessive doubletalk - either due to the nature of the discussion (e.g. an argument) or high round trip delay.
Echo is a common problem in Voice over IP services - not because VoIP introduces echo but because it increases round trip delay, which makes echo more obvious. It is possible to troubleshoot echo problems in many cases, and deploying the VoIP performance management framework (i.e. RTCP XR VoIP Metrics, and SIP QoS reports) can help to identify in-service problems.
 IETF RFC3611 RTP Control Protocol Extended Reports (RTCP XR) - VoIP Metrics
T. Friedman, R. Caceres, A. Clark
 ITU-T Recommendation G.168 Digital Network Echo Cancellers
Alan Clark Ph.D., TMCnet VoIP Performance Management Columnist, founded Telchemy Incorporated in August 1999. Prior to Telchemy, Clark was the Chief Technical Officer at Hayes Corporation and played a key role in establishing industry-wide voice/data integration standards.
Dr. Clark is the inventor of the V.42bis data compression algorithm and the architect and editor of the V.58 network management standard. Published widely, he has nine granted and five patents pending and is recognized as a major authority in QoS and Packet Voice research and development.
Telchemy, Incorporated is the global leader in real-time VoIP Fault and Performance Management with its VQmon and SQmon families of call quality monitoring and analysis software. Telchemy is the world's first technology company to provide voice quality management tools that consider the effects of time-varying network impairments and the perceptual effects of time-varying call quality.
[ Back To TMCnet.com's Homepage ]