The Voice of IP

Bringing Real-Time Communications to the Web

By TMCnet Special Guest
Jonathan Rosenberg
  |  April 01, 2011

This article originally appeared in the April 2011 issue of Unified Communications

Here is a challenge: Can you name the one (and only) category of desktop applications that cannot run in the web browser without the assistance of a plugin?

Office productivity software can. Music applications can. Even 3D games can run using the latest HTML5 capabilities. What is the one application that cannot?

You guessed it: Real-time voice and video communications cannot run in a browser today without using a plugin.

This limitation has not gone unnoticed. Over the last sixth months the industry has been organizing to fix this problem. A workshop among industry experts was held at the end of 2010 and, based on that, standards activities were kicked off at both the W3C (News - Alert) and the IETF. The IETF was expected to hold a birds of a feather session – also known as a pre-working group – at its meeting last month in Prague. A charter for a new W3C activity is circulating as well, with approval expected shortly.

What work needs to be done, exactly? Why is it that you cannot do real-time communications in the browser? And what needs to be enabled to make it happen?

Right now, there are several key limitations that need to be addressed:

·         Browsers today lack the capabilities needed for real-time transport of audio and video. They can handle non-real time audio and video, but not real-time.

·         Browsers today lack the ability to utilize the camera and microphones attached to the computer in a way that is suitable for real-time communications.

·         Browsers today lack the ability to establish a peer-to-peer session with another browser or endpoint. That is necessary for transmitting media with low delays.

To fix all of this, standards will need to be defined in several areas.

The first order of business is to specify a real-time transport stack that will go into the browser. This specification includes protocols needed to transport media on the Internet – such as the real-time protocol and the secure real-time protocol. The specification will also need to define internal software components that need to be added to the browser – jitter buffers for handling network delays, recovery components for handling loss, and noise suppression components for removing background audio noise, among others.

The specifications will also need to define the protocols needed for establishing a peer-to-peer connection to carry that media. Issues such as firewall traversal and browser security need to be solved. The most likely candidate is the interactive connectivity establishment protocol, standardized by the IETF in RFC 5245, which handily addresses both the firewall traversal and the browser security issues. However, details need to be specified on how it is utilized in the browser.

The most complex piece of standards work is around the voice and video codec. Current browsers do not support an audio codec suitable for high quality real-time speech conversations. The new Opus codec emerging from the IETF is a good candidate for that role.

The more complex situation is the video codec. HTML5 has defined a video tag (News - Alert) for delivering streaming video to the browser. Despite much effort, the industry did not agree on a mandatory codec that everyone would implement. This disagreement is rooted in complex intellectual property issues. The industry is now splintering on this issue, with Google (News - Alert) and Mozilla embracing VP8, and others embracing the long-established industry codec H.264. This disagreement impacts real-time communications in a serious way. Without agreement on a common video codec, browser-to-browser calling may require extremely expensive real-time video transcoding. The need for transcoding may make the service impossibly expensive to offer in many cases. The situation is different for streaming video services, where lack of a common codec among browsers adds cost, but only incrementally so. For real-time communications, it requires an entirely new set of expensive infrastructure that also happens to negatively impact the quality of the experience.

The IETF will tackle specification of these various protocols in the browser, and the W3C will then define an API that exposes services in Javascript, the programming language of the web. The APIs will allow applications to create P2P voice and video connections, select codecs, adjust behavior of the media stacks, collect statistics on their operation during and after a call, and request access to camera and microphone. The API design will also need to consider security issues – ensuring that a rogue website cannot capture the content of your camera and mic without your permission, or direct your computer to send video to the target of an attack.

Once this standards work is complete, the industry will be able to cross off the last item on the list of things you cannot do natively in the browser.

Jonathan Rosenberg is chief technology strategist at Skype (News - Alert) (

TMCnet publishes expert commentary on various telecommunications, IT, call center, CRM and other technology-related topics. Are you an expert in one of these fields, and interested in having your perspective published on a site that gets several million unique visitors each month? Get in touch.

Edited by Stefania Viscusi