Standard Issue

No doubt you've heard the old adage: "The great thing about standards is there are so many to choose from!" While often recited with a chuckle, the proverb is a sad commentary on the reality of interoperability. Even though standards get produced, there are often multiple standards covering the same problem domain. As a consequence, products cannot interoperate because they have chosen different standards.

Telecommunications has a long history of multiple standards. Looking for a standard for gateway control? How about MGCP (the media gateway control protocol) or H.248? Looking for call signaling? Choose between SIP or H.323, or possibly even Jingle. Presence? Lots of choices there - XMPP (the extensible messaging and presence protocol), SIMPLE (SIP for IM and presence leveraging extensions), and several vendor-defined protocols.

Despite the multiplicity of standards, the industry has survived. Oftentimes, it has done so through protocol converters, which map from one to another to connect products together. In other cases, products implement more than one protocol to maximize interoperability with other products.

However, there is one area where the lack of a single, widely accepted standard has hurt the most - codecs. The codec - and speech codecs in particular - are responsible for taking your voice, sampled from the microphone on your computer or phone, and converting it to a format that is suitable for transmission over the network. If two endpoints cannot agree on a common codec, a conversion is required. The conversion process - called transcoding - is expensive (usually requiring dedicated hardware) and reduces the quality of the conversation.

The most widely used speech codec today is G.711, which is effectively uncompressed narrowband digital speech running at 64kbps. G.711 speech is narrowband because it only encodes the low frequency harmonics of your voice. These are the most important parts for conveying intelligibility of speech, but the loss of the higher frequency harmonics makes people sound as if they're far away or in a box. G.711 is the standard for the public switched telephone network, and is also widely used in VoIP systems. However, it is an inefficient way to transmit voice, and there are numerous standards that operate at lower bit rates. G.729 runs at 8kbps, and G.723.1 at 5.3 or 6.3kbps. Both are narrowband and have been in existence for many years.

Wideband codecs - which can encode the higher frequency harmonics in your voice - are a recent innovation that dramatically improves the quality of speech. Because they are incompatible with the PSTN, they are primarily for voice over IP. There are fewer standards in this space.

G.722.1 supports wideband and runs at moderate bit rates (24-32kbps), but has only moderate quality and large delays. AMR-WB, also known as G.722.2, has been defined primarily for use in next-generation mobile phones. It has good quality and low bit rates, but is riddled with patents and comes with expensive licensing terms. Because of this, it has not seen widespread adoption. Indeed, royalties have been an impediment to the adoption of codecs throughout the entire history of voice over IP. To become truly pervasive - to be implemented in hardphones, softclients, and ultimately browsers and mobile phones - a codec needs to be royalty free. Vendors of such products are reluctant to pay the fees, and royalties make it extremely hard to incorporate the technology into open source projects.

What options are there for an industry-standard, wideband, robust, royalty-free voice codec for the Internet?

Today - none.

Fortunately, there is active work to remedy this problem. The IETF just recently approved the formation of a new working group that will be standardizing a wideband codec for the Internet. It is an explicit goal of this activity to produce a codec that is royalty free and of the highest quality. The group has attracted the attention of some of the best and brightest codec designers in the world. I'm pleased to say that Skype is an active participant in this. Furthermore, Skype recently contributed its super wideband speech codec, called SILK, as an input to the process (super wideband provides even higher fidelity than wideband). It has even published the source code for all to see, so that its merits can be evaluated as part of the process. SILK powers almost all Skype-to-Skype calls today, and it is the technology that makes those calls feel like you are in the same room.

Producing a royalty-free, high-quality speech codec is no small task. However, it is a critical one. By making super wideband a built-in part of the voice-over-IP experience, the industry can step beyond the quality of the PSTN. To get there, we need a codec that can become pervasive. Once it becomes pervasive, we will truly have a single standard for speech coding.

Jonathan Rosenberg is chief technology strategist at Skype (www.skype.com).

TMCnet's Online Communities™

TMCnet Magazines

TMCnet Events

TMCnet's Technology Sites

Browse News by Topics

TMCnet BLOGGERS

TMCnet Resources

About TMCnet

Subscriptions

Follow Us Your Way

CHANNEL BY TOPICS

QUICK LINKS

Standard Issue

Technology Marketing Corporation

IMPORTANT

SUBSCRIPTIONS

STAY CURRENT YOUR WAY