November 2003
Crossing The Final Frontier
Providing Voice Services Through Firewalls
BY DAN FREEDMAN
When VoIP protocols such as SIP, MGCP, and H.323 were created, firewall
issues were well known. Why then do these protocols need help to cope with
the Network Address Translation (NAT) that firewalls almost universally do
today?
The answer is that with good justification, NAT is considered an
aberration by standards bodies such as the IETF. As a result, protocol
features that exist solely to deal with NAT or firewall problems are
generally disallowed. Instead, protocol users are expected to jump aboard
more modern underlying IP transport mechanisms such as IPv6, in which the IP
address space is large enough to render NAT irrelevant. Unfortunately, the
road to IPv6 is many years old, and its adoption by the mainstream is very
low. The Catch-22 situation is people who don�t absolutely have to do so
won�t make the move from IPv4 to IPv6 until everyone else has already moved.
So in the meantime, the rest of us have to shoehorn VoIP protocols into an
IPv4 world, where NAT and firewalls can trip us up.
What causes VoIP protocols to fare so poorly through NAT-enabled firewalls?
There are two root causes of these problems: Network Address Translation and
firewall packet admission policies. To understand the issues, first realize
that a VoIP call is a fairly complex exercise in communication, involving
bi-directional streams of call set-up information, bidirectional streams of
call audio information to carry the sound, and bi-directional streams of
audio quality reporting information. Additionally, if the call includes
video as well as audio, four extra streams are needed -- two to carry the
bi-directional video, and another two to report on the quality of that
video. In total, between six and 10 highly inter-related streams of
information are involved in each VoIP call. If this sounds like a lot, it
is. Compare it to a typical e-mail transaction, where a request is made over
a single channel to a server, and a response is made back over the same
channel. Similarly, a Web browser might make hundreds of requests to display
a page, but each of them is a simple query/response over a single channel.
In fact, prior to VoIP�s emergence as a new killer application on the
network scene, the most complex protocol interactions involved ftp, which
uses two channels rather than just one.
Why so complex? VoIP is different because it needs to move a lot of
time-sensitive data around the network -- data that becomes useless if it
arrives even a fraction of a second too late. If your e-mail takes an extra
two or three seconds to download, you may not like it, but you�ll still read
it once it arrives. But if someone�s voice arrives in a VoIP call a couple
of seconds late, it�s not only frustrating, but also difficult to fix. If a
few seconds of voice arrives late, you�ll notice silence on the call, but if
the network then �catches up� and delivers those missing seconds at the same
time it�s delivering the next few seconds of voice, what should the VoIP
phone do? It can�t play the late voice because the newer voice packets have
arrived and it�s their turn to be played. So the late packets are simply
discarded as waste. If this happens too often, your call will be
indecipherable, and your impression will be one of poor quality. Several
years ago, this situation was normal. But protocols in use today, coupled
with error correcting CODECs for encoding voice traffic now mean that
virtually all calls over the Internet have very good quality -- somewhere
between cell phone and land-line quality, and often better than land-line
quality. While all of the complexity needed to achieve these remarkable
capabilities is largely invisible to the end-user, they are very visible to
the network, and account for most of VoIP�s protocol complexities.
One thing that causes problems for firewalls is the fact that VoIP call
set-up information usually travels through many different Internet servers
en-route from caller to called party, while the call audio (and video)
information usually goes directly from caller to called party without
passing through any additional servers. This means that the fast-paced
stream of incoming audio packets arriving at a firewall from the �other end�
of a call typically comes from an IP address that the firewall has not
recently sent to. When phone A calls phone B, the call setup information
travels from A out over the Internet to A�s call controller, then over to
B�s call controller, where it goes inbound through B�s firewall to phone B.
But the audio for the call will usually go from phone A outbound through A�s
firewall to B�s firewall, then inbound to phone B -- a very different path
from the call set-up information. Both A�s and B�s firewall usually see the
arriving audio packets as �attack packets,� and deny them access, because
they come from a source that neither phone A nor phone B have directly sent
packets to before. Without call knowledge, the firewalls cannot by
themselves account for this difference in source, and so the VoIP call will
fail.
Another thing that causes problems for voice calls is the Network Address
Translation that virtually all firewalls perform in order to share a small
number of public IP addresses among a larger number of computers on a
private local-area network. NAT allows a network of, for example, 10,000
computers to use only a handful of IP addresses, by piggybacking many
concurrent communications on each address. Naturally this involves the
firewall changing the IP headers of each communication packet that passes
through it. But some packets have IP addresses riddled throughout their
contents, not just in the headers. Without detailed knowledge of the
protocols involved in a communication, the firewall cannot by itself
translate all of these addresses correctly. While most firewalls have such
knowledge for older, smaller protocols such as FTP, they do not do a good
job of translating newer protocols such as those used in VoIP. Small wonder,
given the complexity of these protocols -- the latest SIP protocol draft (IETF
RFC 3261) is over 280 pages long, and that�s just to deal with call set-up
information!
Finally, a particularly vexing problem for VoIP is that its inbound
communications through a firewall can happen at any time, whereas other
protocols have inbound communications only when there has been a recent
outbound communication. Firewalls are good at leaving open a path for the
return packet when an outbound request is made. But for VoIP, it is
desirable to have the ability to cause a phone to ring even when it has not
been involved in any phone calls for the last several hours, days, or even
weeks. Firewalls have to be very careful when leaving �long-term� pinholes
open through their security shielding, or else an attacker could take
advantage of the pinholes to find an unauthorized path of entry into the
protected network. Yet there must somehow be the ability to cause a phone to
ring, or else VoIP becomes useless.
Fortunately, it�s not all gloom and doom. Many companies are working on a
variety of approaches to improve the life of the VoIP user behind a
firewall. Some of these approaches require firewall upgrades, and so will be
subject to a long adoption cycle, unfortunately. Other approaches involve
changing the behavior of the VoIP phones themselves so that they become more
aware of the firewall topology behind which they operate. The most promising
approaches being taken push the call-level knowledge to deal with firewall
issues either up into the VoIP service providers� network, or at least into
a corporation�s DMZ where a server can handle inbound and outbound requests
on behalf of firewall-protected phones. As a community, our experience with
these approaches is a little over two years old, and the solutions do seem
to work well.
So, if even VoIP can be made to work well over IPv4, what will it take to
compel people to move to the far more capable IPv6, in which NAT may not be
required and firewalls can be far more capable? The U.S. Department of
Defense has set an internal goal of 2008 by which to have its networks
running IPv6. The government of Japan is set to mandate adoption of IPv6 by
regulated industries and government institutions. Perhaps within five-10
years from now there will be enough IPv6 deployed to render the decision to
switch an easy one. In the meantime, the IPv4 world will continue to shuffle
along with new protocols, new problems for old equipment, and new approaches
to tying it all together. Looking on the bright side, it is fair to say that
we now have the opportunity to carry all kinds of data, including real-time
traffic such as voice and video calls, over a single underlying network
infrastructure that is largely already in place. In large part, this should
bring a smile to your face, despite all the behind-the-scenes ugliness
needed to make it work during this transition period.
Dan Freedman is CEO of Jasomi Networks, Inc. Jasomi brings products to
market designed to enable or improve IP telephony business models. The
company�s focus is on creating innovative, practical solutions that
effectively solve specific IP telephony problems. For more information,
please visit www.jasomi.com.
[ Return
To The November 2003 Table Of Contents ]
|