By Todd Simpson and Alan Hawrylyshen
GEMAYA is a term coined by David Kirkpatric in a recent Fortune article as an acronym for Google, eBay, MSN, AOL, Yahoo, and Amazon — the Internet heavyweights. Like IBM and the BUNCH from the heyday of large computers, the distinguished GEMAYA group is likely to pioneer a new generation of interactive services layered atop their current core offerings. The integration of voice, video, banking, gaming, and other real-time applications with instant messaging, chat rooms, and e-mail is already well underway. One example is eBay’s recent acquisition of VoIP provider Skype. Ponder for a moment the potential of combining a global marketplace with robust financial services (PayPal) — both backed by voice, video, multimedia messaging, and more.
Will one-stop-shopping with such sophisticated capabilities enable GEMAYA to leapfrog traditional voice carriers with fully-bundled and low-cost (or free) services? There is no way of knowing today, but two things are fairly certain at this stage. First, driven by IP cost models and service potential, these are (finally!) exceptionally interesting times to be in the Internet telephony business. And second, regardless of the direction the market takes, the Session Initiation Protocol (SIP) is destined to play a critical role. This article analyzes both the solid core and ragged edges of SIP (and some related protocols), and investigates those areas where evolution is most intense and current implementations are most challenged. These areas include interoperability, security, quality (as a superset of QoS), middleware and ‘middleboxes,’ and support for rich application deployment.
The key to creating winning solutions in this rapidly-evolving space will be the ubiquitous user experience — all applications, anytime, anywhere, on any device. Such ubiquity is most likely to be derived from formal standards, as opposed to a single de-facto standard largely because no single player has enough critical mass to force-feed a proprietary solution that satisfies the absolute need for interoperability. As in many leading-edge areas, the standards are evolving and competing. Still, the Session Initiation Protocol (SIP) is the new standard for real-time services. The reasons are straightforward: SIP is a flexible, extensible, rich, and highly-leveraged specification. SIP puts few bounds on potential applications, allows for extensions and enhancements, and reuses some of the best Internet technologies to date (for example, encryption and authentication mechanisms, and MIME data types). Of course, with flexibility and richness comes the potential for complexity and confusion. Today SIP is in its adolescence; much has been learned since its infancy, and much more will be learned as it matures.
Interoperability, or even operability, continues to be an issue in today’s growing SIP-based infrastructures. While there are still significant discussions around interpretations of the SIP specification, many of these issues occur in areas the specification simply does not address. Still fundamental in this area is the issue of NAT and firewall traversal: SIP uses IP addresses in order to set up sessions (How else would it be done?), and these addresses are invalidated by Network/Port Address Translation, and firewall behavior. Many non-standard approaches are in use to solve this problem, but none can claim to be 100 percent effective owing to the overwhelming variety of NAT and firewall devices (for example, port-rotating firewalls can still be problematic). Emerging standards and processes such as STUN, TURN, and ICE should soon allow this particular area to become more cohesive.
Other interoperability issues can occur based on the implementation. For example, SIP does not specify an upper bound on header sizes, but many implementations have hard-coded bounds; messages beyond these bounds are rejected. Improper implementations of registrars and proxies, either through bugs, incomplete designs, or misinterpretation of the specification can also lead to interoperability problems. Lack of support for full DNS resolution (as documented in RFC 3263 including NAPTR and SRV support), basic security mechanisms, and flawed CODEC negotiations, might also lead to interoperability issues. Fortunately, the industry recognizes the need for solid interoperability, and efforts such as the SIP Forum’s SIPit events have enabled much progress. And yet, the industry is likely to end up with pseudo-interoperability, much like with HTTP, where content works better in some browsers than in others. Of course, unlike the World Wide Web where only a handful of browsers are used, the world of VoIP currently involves myriad different systems and devices from a plethora of vendors.
Now that several large VoIP access networks are operational on the Internet, the issue of security is becoming increasingly important. And like all security concerns, there is a mix of real issues and fear-mongering. The SIP specification leverages the best of mature Internet security models, which when fully implemented, distill the true areas of concern to border cases. Unfortunately, the state of deployments today tends not to include even basic security provisions, leaving networks open to many sorts of breaches.
SIP contains excellent support for ensuring point-to-point confidentiality and encryption, including the adoption of digest authentication, S/MIME, and TLS encryption, and the ability to share keys for media encryption. Many of these mechanisms, however, still have subtle technical or management drawbacks; for example, sharing keys for digest, or managing the tradeoffs between end-to-end and point-to-point architectures. For these reasons, implementing full cryptographically secure end-to-end authentication remains a challenge, especially given the realities of disparate domains of trust and the existence of the middle-ware boxes needed to overcome other interoperability issues. Finding and deploying a full solution that addresses this problem will also be essential to combating SPAM over Internet Telephone, or SPIT, as well as other nefarious attacks.
Another example where security could be compromised is with forked requests — where a single invitation is sent to multiple contact points (a home and cell phone, for example). Because the means to authenticate and authorize the response from each fork is not well specified (a proxy many only return the response from one endpoint, somewhat arbitrarily chosen) many different behaviors are possible. This problem area is known as the Heterogeneous Error Response Forking Problem, or HERFP, and remains under discussion at the IETF.
An additional example is the interplay of SIP with other Internet protocols. Routing of SIP requests is often handled via DNS using the NAPTR and SRV records. SIP itself does not specify how to validate or authorize DNS results, so tampering with SRV records can be used to misroute messages. The interplay between SIP and other protocols is a fruitful area of research and implementation innovation. Of course, security breaches that compromise underlying (and unprotected) protocols and resources are not unique to VoIP in general and SIP in particular. This widespread problem is why work is ongoing by the DNSSEC group and others at the IETF to enhance DNS integrity.
Overall user experience, or “quality,” is also still hit and miss in today’s networks. Again, SIP itself is not the culprit, or perhaps even the solution, but the interplay between SIP and other functions needs to evolve to address this issue. Quality includes always having connectivity, the speed of connection (ringing, for example), and the quality and consistency of media delivery. Within one homogeneous environment, adequate quality may be delivered by making logical network and bandwidth decisions, and enforcing these across the network. Across heterogeneous networks, however, the problem is significantly more challenging. Where one network may employ MPLS or VLANs to guarantee QoS, another may simply use DiffServ and where an IP network meets the PSTN, numerous interface issues (such as echo and security) can also occur. Even between peering partners carrying purely IP backbone traffic, there may be different design choices, especially involving encoding. For this reason, a network that is optimally designed for larger packets at slower intervals may not work well with a network optimized for smaller, more frequent packets. Jitter and latency correction at the endpoints may not be sufficient to compensate for allowing arbitrary media routing.
Deploying services beyond VoIP across SIP-based networks has its own set of challenges. Environments like IMS and TISPAN (the 3GPP’s IP Multimedia Subsystem and ETSI’s Telecoms & Internet converged Services & Protocols for Advanced Networks) anticipate video, TV, gaming, access to back-end databases, and many other session-related applications. Administering, controlling, authorizing, and guaranteeing quality of service across these multimedia networks is non-trivial. For example, while a few seconds of delay in ringing another user agent may be acceptable for telephony applications, it may be completely inadequate for a massive online gaming environment. Likewise, while non-repudiation may be manageable within voice-only applications, it must be rock solid for banking applications. Thus, the scope and complexity of the issues mentioned above become more acute in a rich application environment. And overcoming interoperability issues between SIP and existing security mechanisms, like X509, S/SMIME and the related PKI technologies, will be essential to widespread adoption of these services.
At the other end of the spectrum from IMS, which has a heavy back-end infrastructure, is the work on P2P SIP, which attempts to remove any dependence on a back end. The ability to quickly set up ad-hoc, real-time sessions using a P2P system (along the lines of Skype) has obvious advantages to the end user — and disadvantages to service and equipment providers. There is nothing fundamentally difficult in having P2P endpoints communicate with an IMS infrastructure, however, other than the previously highlighted management and control of such services. Pure P2P SIP applications will occur; planning for their integration and control is only just beginning to be discussed.
As the clash of the titans heats up — GEMAYA versus The Incumbents — SIP is positioned to play a central role. While SIP has certainly proved its flexibility and worth to date, it still has much room for improvement and growth. This makes the SIP battlefield an exceptionally exciting and (potentially) prosperous place to be. SIP is solidly positioned to be the underlying standard for real-time services on the Internet. And given the relentless growth in connectivity and available bandwidth, what application can afford to not be real-time!