ITEXPO begins in:   New Coverage :  Asterisk  |  Fax Software  |  SIP Phones  |  Small Cells
January 2007
Volume 10 / Number 1

Can Your Customers Hear You? Optimizing VoIP Audio Quality

By Steve Graham


Audio quality and, in particular, voice intelligibility needs to be a top consideration when evaluating VoIP features because it directly impacts daily business and influences how customers, partners, and other important constituents perceive your company. If your employees are unintelligible, what message does that send about your company?

Superior audio quality allows more natural and productive business conversations both outside and within your organization. With travel budgets on hold and geographically dispersed teams, productive teleconferencing is key to getting the job done.

VoIP (define - news - alert) provides opportunities — including video conferencing, use of wideband technologies, and potentially lower costs depending on your market and network. Choosing the right components allows you to take the best advantage of those opportunities.

One of these solution components, the headset, is an important device that interfaces directly with a user. A properly chosen headset can improve the perception of audio quality for both desk phones and softphones. It can compensate for environmental and network factors and allow more natural and productive business conversations.


What constitutes audio quality?

Audio quality is a balance of audio performance and sufficient sound quality for good interpersonal communications— and includes the transmittal, reception, intelligibility, and performance of the audio signal with special emphasis on the human speech range.

“Perfectly clear sound” doesn’t always provide speech intelligibility. It addresses how the sound is manipulated, but there are other factors such as delay or latency that could cause “perfectly clear sound” to be unintelligible if conversational dynamics are destroyed, or echo occurs, disrupting speech. If people find themselves talking over each other, speaking louder or slower, or altering their behavior to communicate effectively, this is not an optimal experience. The network system may transmit human speech with intelligibility and clarity, but if there is significant latency, conversational dynamics are disturbed.


So what goes into creating good sound quality?

Fricatives. Human speech falls within the frequency range of 120-8000Hz, so we are considering the quality of sound within that range as it gets transmitted and received using VoIP.

Speech fundamentals (or “Vowel sounds”) for male voices are within the range of 120-200 Hz, and fundamentals (vowel sounds) for female voices fall within the 250-350 Hz range. There are other sounds, such as fricative sounds, that fall within the 2500-8000 Hz range. The term fricative is used to describe a sound that is articulated with almost a complete closure, but with just enough of an opening to create turbulence in the airflow. Examples of fricative sounds are the f in fat, v in vat, s in sip, and the z in zip. Fricative sounds also include noise sounds. In English, fricatives are used distinguish the difference between a single and a plural noun by providing the s sound at the end of the word.

A properly chosen headset will capture and transmit all of this information without clipping or condensing the signal. For example, a headset that has a microphone located close to the mouth keeps this information separate from any background noise. A headset can also adjust incoming audio levels to maintain a consistent volume throughout the call.

Telephony Standards. There are two relevant telephony standards managed by the Telecom Industry Association for VoIP audio performance, one dealing with narrowband digital telephones and the other dealing with wideband telephones. The narrowband standard is ANSI TIA/EIA-810-B. The soon to be evolving wideband standard TIA-920. You can look up these standards at by entering the standard name, or searching the number of the standard.

Narrow band telephony is actually less than the natural speech range and covers 150Hz-3.5KHz (sometimes 300 Hz because noise filtering applied to avoid hum and harmonics) — this varies depending on headset or handset and on some of the network infrastructure.

In comparison, wideband at 120Hz to 7KHz covers the full range of natural speech including those areas of the spectrum contributing directly to greater intelligibility. However, wideband has been implemented differently by different vendors so if you are using equipment from vendors that aren’t compatible, the system may revert to narrowband and use the common denominator.

A properly chosen headset can filter background noise and adjust audio levels to produce a signal that works well with whichever standard your organization is using. It does this by using both acoustic and electronic design techniques that focus on the important aspects of speech needed for optimum intelligibility and on the reduction of background noise elements that can mask or blur speech signals.

Latency. In VoIP terminology, latency refers to a delay in packet delivery. To the people involved in conversation, latency means how long they need to wait for sound to travel between them.

A properly chosen headset adds very little latency to the system; however if the network already has high latency, consider using a corded headset.

Echo Control. Echo is the sound of the speaker’s voice returning to their ear via the handset or headset speaker. Some echo is desirable. However, that sound can also be picked up by the microphone and sent to the other listeners in a call. The result can downgrade the conversational experience, cause users trip over words, and slow down speech. A good headset can provide echo prevention and management through the use of echo suppression and cancellation or, ideally, by using digital signal processing (DSP).

Standards Compliance. Standards compliant means a greater likelihood for “good sound,” compatibility with both hard and softphones, correct audio levels, and that the right bit rates and vocoders are used to have calls be fully compatible between networks. You should make sure the solution you are considering is standards compliant.


How does a headset help improve sound quality?

A well chosen headset adds little latency within a network environment, so conversational dynamics are not disturbed and it provides echo mitigation and cancellation. Additionally, it provides correct audio levels, bit rates, and vocoders and is compatible (with both hard and softphones.)

It also compensates for background noise by enhancing speech signal-tonoise ratio, and it can compensate for a narrowband/wideband mismatch. Mismatch has to do with operating a wideband device (such as a headset or handset) in a narrowband environment. The truncated content creates pops and clicks, similar to screen artifacts in video. The headset must be designed in such a way to prevent adding any artifacts in the bandwidth outside of narrowband. Sound artifacts distract the listener and degrade the sound experience.


Questions you should ask a headset vendor

  • How much latency does your device add to the system? You want the total latency (your system’s latency plus the device) to be less than 150ms one way or 300ms round trip for power users, or perhaps users will accept more latency but probably 250ms one way or 500ms round trip will be approaching unacceptability.
  • What does your company or products do for echo control? Possible answers: voice switching (lower cost, but reduced conversational performance), echo cancellation (a better solution that uses a DSP and costs more), or choose a headset with longer boom mic because it needs less echo mitigation.
  • What does your company know about compatibility with telecommunications network standards?
  • If your system supports wideband, which standard does it support? The applicable standard is ITU-T Recommendation G.722, with G.722.1 as an alternative addendum. (ITU-T is International Telcommunications Union – Telecom)



Audio quality needs to be a prime consideration when you are evaluating VoIP solutions. Headsets dramatically impact VoIP user perception of audio quality and should be a part of your purchasing decision.

Ask questions to get the best possible audio quality in your VoIP system (and to ensure your customers will hear you). You want to reduce latency, provide echo control, and choose a solution that is standards compliant.

Steve Graham is a Principal Engineer at Plantronics. (news - alert) For more information, visit the company online at


Today @ TMC
Upcoming Events
ITEXPO West 2012
October 2- 5, 2012
The Austin Convention Center
Austin, Texas
The World's Premier Managed Services and Cloud Computing Event
Click for Dates and Locations
Mobility Tech Conference & Expo
October 3- 5, 2012
The Austin Convention Center
Austin, Texas
Cloud Communications Summit
October 3- 5, 2012
The Austin Convention Center
Austin, Texas