ITEXPO begins in:   New Coverage :  Asterisk  |  Fax Software  |  SIP Phones  |  Small Cells

Feature Article
November 2003

Crossing The Final Frontier
Providing Voice Services Through Firewalls


When VoIP protocols such as SIP, MGCP, and H.323 were created, firewall issues were well known. Why then do these protocols need help to cope with the Network Address Translation (NAT) that firewalls almost universally do today?

The answer is that with good justification, NAT is considered an aberration by standards bodies such as the IETF. As a result, protocol features that exist solely to deal with NAT or firewall problems are generally disallowed. Instead, protocol users are expected to jump aboard more modern underlying IP transport mechanisms such as IPv6, in which the IP address space is large enough to render NAT irrelevant. Unfortunately, the road to IPv6 is many years old, and its adoption by the mainstream is very low. The Catch-22 situation is people who don�t absolutely have to do so won�t make the move from IPv4 to IPv6 until everyone else has already moved. So in the meantime, the rest of us have to shoehorn VoIP protocols into an IPv4 world, where NAT and firewalls can trip us up.

What causes VoIP protocols to fare so poorly through NAT-enabled firewalls? There are two root causes of these problems: Network Address Translation and firewall packet admission policies. To understand the issues, first realize that a VoIP call is a fairly complex exercise in communication, involving bi-directional streams of call set-up information, bidirectional streams of call audio information to carry the sound, and bi-directional streams of audio quality reporting information. Additionally, if the call includes video as well as audio, four extra streams are needed -- two to carry the bi-directional video, and another two to report on the quality of that video. In total, between six and 10 highly inter-related streams of information are involved in each VoIP call. If this sounds like a lot, it is. Compare it to a typical e-mail transaction, where a request is made over a single channel to a server, and a response is made back over the same channel. Similarly, a Web browser might make hundreds of requests to display a page, but each of them is a simple query/response over a single channel. In fact, prior to VoIP�s emergence as a new killer application on the network scene, the most complex protocol interactions involved ftp, which uses two channels rather than just one.

Why so complex? VoIP is different because it needs to move a lot of time-sensitive data around the network -- data that becomes useless if it arrives even a fraction of a second too late. If your e-mail takes an extra two or three seconds to download, you may not like it, but you�ll still read it once it arrives. But if someone�s voice arrives in a VoIP call a couple of seconds late, it�s not only frustrating, but also difficult to fix. If a few seconds of voice arrives late, you�ll notice silence on the call, but if the network then �catches up� and delivers those missing seconds at the same time it�s delivering the next few seconds of voice, what should the VoIP phone do? It can�t play the late voice because the newer voice packets have arrived and it�s their turn to be played. So the late packets are simply discarded as waste. If this happens too often, your call will be indecipherable, and your impression will be one of poor quality. Several years ago, this situation was normal. But protocols in use today, coupled with error correcting CODECs for encoding voice traffic now mean that virtually all calls over the Internet have very good quality -- somewhere between cell phone and land-line quality, and often better than land-line quality. While all of the complexity needed to achieve these remarkable capabilities is largely invisible to the end-user, they are very visible to the network, and account for most of VoIP�s protocol complexities.

One thing that causes problems for firewalls is the fact that VoIP call set-up information usually travels through many different Internet servers en-route from caller to called party, while the call audio (and video) information usually goes directly from caller to called party without passing through any additional servers. This means that the fast-paced stream of incoming audio packets arriving at a firewall from the �other end� of a call typically comes from an IP address that the firewall has not recently sent to. When phone A calls phone B, the call setup information travels from A out over the Internet to A�s call controller, then over to B�s call controller, where it goes inbound through B�s firewall to phone B. But the audio for the call will usually go from phone A outbound through A�s firewall to B�s firewall, then inbound to phone B -- a very different path from the call set-up information. Both A�s and B�s firewall usually see the arriving audio packets as �attack packets,� and deny them access, because they come from a source that neither phone A nor phone B have directly sent packets to before. Without call knowledge, the firewalls cannot by themselves account for this difference in source, and so the VoIP call will fail.

Another thing that causes problems for voice calls is the Network Address Translation that virtually all firewalls perform in order to share a small number of public IP addresses among a larger number of computers on a private local-area network. NAT allows a network of, for example, 10,000 computers to use only a handful of IP addresses, by piggybacking many concurrent communications on each address. Naturally this involves the firewall changing the IP headers of each communication packet that passes through it. But some packets have IP addresses riddled throughout their contents, not just in the headers. Without detailed knowledge of the protocols involved in a communication, the firewall cannot by itself translate all of these addresses correctly. While most firewalls have such knowledge for older, smaller protocols such as FTP, they do not do a good job of translating newer protocols such as those used in VoIP. Small wonder, given the complexity of these protocols -- the latest SIP protocol draft (IETF RFC 3261) is over 280 pages long, and that�s just to deal with call set-up information!

Finally, a particularly vexing problem for VoIP is that its inbound communications through a firewall can happen at any time, whereas other protocols have inbound communications only when there has been a recent outbound communication. Firewalls are good at leaving open a path for the return packet when an outbound request is made. But for VoIP, it is desirable to have the ability to cause a phone to ring even when it has not been involved in any phone calls for the last several hours, days, or even weeks. Firewalls have to be very careful when leaving �long-term� pinholes open through their security shielding, or else an attacker could take advantage of the pinholes to find an unauthorized path of entry into the protected network. Yet there must somehow be the ability to cause a phone to ring, or else VoIP becomes useless.

Fortunately, it�s not all gloom and doom. Many companies are working on a variety of approaches to improve the life of the VoIP user behind a firewall. Some of these approaches require firewall upgrades, and so will be subject to a long adoption cycle, unfortunately. Other approaches involve changing the behavior of the VoIP phones themselves so that they become more aware of the firewall topology behind which they operate. The most promising approaches being taken push the call-level knowledge to deal with firewall issues either up into the VoIP service providers� network, or at least into a corporation�s DMZ where a server can handle inbound and outbound requests on behalf of firewall-protected phones. As a community, our experience with these approaches is a little over two years old, and the solutions do seem to work well.

So, if even VoIP can be made to work well over IPv4, what will it take to compel people to move to the far more capable IPv6, in which NAT may not be required and firewalls can be far more capable? The U.S. Department of Defense has set an internal goal of 2008 by which to have its networks running IPv6. The government of Japan is set to mandate adoption of IPv6 by regulated industries and government institutions. Perhaps within five-10 years from now there will be enough IPv6 deployed to render the decision to switch an easy one. In the meantime, the IPv4 world will continue to shuffle along with new protocols, new problems for old equipment, and new approaches to tying it all together. Looking on the bright side, it is fair to say that we now have the opportunity to carry all kinds of data, including real-time traffic such as voice and video calls, over a single underlying network infrastructure that is largely already in place. In large part, this should bring a smile to your face, despite all the behind-the-scenes ugliness needed to make it work during this transition period.

Dan Freedman is CEO of Jasomi Networks, Inc. Jasomi brings products to market designed to enable or improve IP telephony business models. The company�s focus is on creating innovative, practical solutions that effectively solve specific IP telephony problems. For more information, please visit www.jasomi.com.

[ Return To The November 2003 Table Of Contents ]

Today @ TMC
Upcoming Events
ITEXPO West 2012
October 2- 5, 2012
The Austin Convention Center
Austin, Texas
The World's Premier Managed Services and Cloud Computing Event
Click for Dates and Locations
Mobility Tech Conference & Expo
October 3- 5, 2012
The Austin Convention Center
Austin, Texas
Cloud Communications Summit
October 3- 5, 2012
The Austin Convention Center
Austin, Texas