When people think of IP telephony, thoughts about latency
and poor voice quality are often not far behind. Fortunately, VoIP has
evolved considerably since its early days. Companies such as Octiv (www.octiv.com)
and DiamondWare (www.diamondware.com)
are working hard to reduce latency, eliminate echo, and improve sound
quality, which improves the VoIP experience.
Octiv sells a developerï¿½s toolkit called OctiVox that includes APIs
(including DLLs) for performing audio processing to provide intelligibility
enhancement (clarity) and echo mitigation. Its intelligibility enhancement
function features the ability to enhance conference room situations where
several voices are projected at the microphone from various distances and
In addition, Octiv has partnered with DiamondWare to add DiamondWareï¿½s
latency reduction algorithms into OctiVox for a comprehensive ï¿½enhanced
quality of experienceï¿½ VoIP solution. Without getting into too many
technical details, OctiVox takes shortcuts and eliminates unnecessary
processing by optimizing the time it takes to access the underlying audio
hardware/driver subsystem. In fact, according to Octiv theyï¿½ve determined
that Windows XP has 120ms latency roundtrip, which OctiVox is able to reduce
We tested OctiVox, which comes with DLLs, API headers and libraries, and
sample programs. We installed their main sample program on two PCs ï¿½ a
Pentium 450MHz running Windows 2000, and a Pentium 1GHz running Windows XP.
To eliminate all other causes of latency (such as network congestion) we put
both machines on their own private network. Running OctiVox SoftPhone Demo
on both machines, we initiated a call using the NetBios name, which resolved
to the appropriate IP address. After accepting the call, we held a
two-minute conversation to gauge the speech quality and the latency.
Watching the other personï¿½s lips from across the room and listening for
the spoken word in the headset receiver we found that the latency was almost
imperceptible to the human ear ï¿½ it was that good! Next, we clicked on a
button to turn off OctiVoxï¿½s enhancement techniques and we immediately
noticed an increase in latency and the voice quality was not as
three-dimensional or as warm as with OctiVox turned on. The voice quality
seemed a bit hollow with OctiVox turned off.
So how does it work? OctiVox employs an aggressive buffer management
system that reduces the roundtrip delay to as little as 50 milliseconds. In
addition, OctiVox has a multi-band processor that performs several
functions. First, it makes volume levels consistent using amplitude
normalization that equalizes output from different sources to compensate for
different microphone settings, user speech habits, and user distance from
the microphone. Second, OctiVox uses noise gating to reduce apparent noise
level, increasing speaker clarity in noisy environments. Third, OctiVox
applies spectral balancing to give each speakerï¿½s voice a more uniform
shape and optimizes it to the most comfortable response characteristics for
the human ear. The multi-band processor also utilizes source separation to
discriminate between multiple user voices and applies individual enhancement
processing to increase the ability to hear each userï¿½s voice.
OctiVox also will synchronize input/output by detecting and correcting
sample rate drift within the sound card. OctiVox eliminates near-end
acoustic echoes caused by speaker-to-microphone coupling. In conference
calls when multiple users talk simultaneously, OctiVoxï¿½s acoustic echo
suppression techniques can also reduce acoustic echo experiences at the
remote-end by users who do not have OctiVox. OctiVox also improves
peak-to-average ratio and the signal-to-noise ratio as well as performing
dynamic peak limiting. Interestingly enough, OctiVox can also perform
consonant and vowel enhancement by using advanced time/frequency models of
the speech process. Also, since not all languages are the same, OctiVox can
be customized for different language characteristics.
Using CoolEdit Pro, we measured the time it took for the voice to
transmit from one PC to another PC, out the second PCï¿½s speakers and back
into the microphone on the first PC. With Octiv turned ï¿½offï¿½ we
calculated an average one-way latency of 360ms and with Octiv turned ï¿½onï¿½
we calculated 110ms. This was an improvement of 150ms or 41.6 percent. We
then ran MSN Messenger to see how well this program performs compared to
OctiVox and we calculated an average latency of 150ms, or 40ms (36.6
percent) slower than OctiVox.
We had some issues with a Plantronics USB headset we also tested. We
encountered about a one second delay (latency) when receiving voice on the
USB headset, but for some reason it was perfectly fine when transmitting.
The remote end could hear the USB headset user with almost no latency. We
asked Octiv about this and they said that the version we had didnï¿½t have
USB support yet but they said they are supporting USB microphones/headsets.
Two of the key ingredients to VoIPï¿½s success are voice quality and minimal
latency. OctiVox has done a superb job addressing both of these needs: TMC
Labs was impressed with the improved voice quality and virtually
imperceptible latency of the OctiVox product. Developers will certainly find
OctiVox a perfect solution to develop VoIP soft phones able to rival
traditional phones in quality.
To The August 2002 Table Of Contents ]