
September 1999
Telephony-GradeIP Networking
BY TONY RYBCZYNSKI
"Today's LAN/router-based networks exhibit reliability that is orders of magnitude
below the level required for IP telephony," says a typical IT manager. While some
would no doubt hasten to add that with enough money an IP network could be designed to
meet telephony needs, many more might hesitate, wondering how much money would be
"enough," and whether achieving a telephony-grade IP network might exceed the
bounds of financial reality.
But need the discussion end there, with vague assurances and vague doubts? Not if we
examine a few details. For example, we can study how telephony systems based on
LAN/router-based networks (referred to here as IP networks) differ from classical
telephony systems, that is, systems built around the PBX. Then, we might be better
positioned to understand what needs to be done in the IP network to make it a
telephony-grade infrastructure.
BASIC DIFFERENCES BETWEEN PBX AND IP NETWORK INFRASTRUCTURES
Lets describe the basic differences between a PBX and an IP network
supporting the telephony application. A PBX is a centralized circuit-switching
system that is optimized for telephony traffic. IP telephony is a distributed system
consisting of IP telephones, call servers (also known as connection managers and
gatekeepers), and gateways to the circuit world and to non-IP telephones, all
interconnected by an IP network.
In both cases, twisted pair wiring is used to attach end user devices, starred into the
PBX or into the workgroup LAN switch. The IP network is made up of these workgroup LAN
switches interconnected by campus backbone switches, with routing used between LAN
segments and across the WAN. In the PBX environment, signaling follows the path of the
voice call. In IP telephony, telephony signaling can be between end devices, between the
end device and the call server, or between call servers, and in all cases is handled
independently of the voice traffic.
APPROACHING A TELEPHONY-READY IP INFRASTRUCTURE
Lets now describe what makes a PBX a robust infrastructure for telephony
and contrast it with the state of IP networking. The differences we uncover may suggest
attributes that we could introduce to an IP infrastructure, attributes that would qualify
an IP infrastructure as a telephony system.
Well look at the basic reliability of the infrastructure. Well begin
reliability as it relates to the idle network. (In an idle network, pertinent issues
include network components and power.) Then, we will look at reliability as it relates to
the traffic-bearing network, or even the heavily loaded network. (Pertinent issues include
signaling, data, and information transfer plans.)
Network Components
PBX reliability is specified in terms of base system and component MTBF (mean
time between failures), with the level of redundancy a function of the PBX model. Unlike
PBXs, however, IP telephony systems are fully distributed. Consequently, for an IP
telephony system, the definition of base system reliability is problematic; it
is as much a function of how telephony call server and gateway functions are distributed
and designed, as it is of the underlying data-driven infrastructure.
In any case, the base system MTBF is gated by the most critical component: the call
server and the network switch to which it is connected. The call server is typically an NT
application, and it will become increasingly robust. For larger sites, call server
redundancy is required. The call server can be singly connected to a single workgroup
switch in a small office, dually connected to a campus backbone switch, or over the WAN if
it is located at another site.
Location and design of gateways are also important considerations. Accord-ingly, the
base system reliability of an IP telephony system is impacted by which network design
option is selected and by the inherent reliability of the critical components.
Component MTBF for a PBX user is relatively easy to understand: if the linecard and
telephone are operational, the user will experience the base system reliability (that is,
dialtone). But what about the user of an IP telephony system, a system that is highly
distributed? Here, the MTBF experienced by a single IP telephony user depends on the
reliability of the set (which could be a PC), of the linecard on the workgroup LAN switch,
and of the workgroup LAN switch itself (and any facilities and switches and routers
between the end user and the call server). This is highly dependent on the design of the
IP network and the location of the call server. Many IP networks deployed today have very
limited redundancy at the workgroup level, with progressively more robustness at the
campus backbone and WAN levels.
It is interesting that IP was originally designed as a robust networking technology,
which was quite resilient to network failures through dynamic routing and plug-and-play
LAN functionality. Intelligence built into the data application supported end-to-end
protocols such as the Transmission Control Protocol (TCP), which ensured reliable
transport across what was assumed to be a lossy network with variable bandwidth and delay.
The focus was on making higher level protocols more reliable rather than the underlying
infrastructure.
The reality is that the networking design principles that work very well for TCP/IP
data applications are inadequate for real-time applications such as IP telephony, which
have latency constraints that preclude using mechanisms such as retransmissions to
overcome packet loss. For example, spending seconds to find alternate routes around
failure conditions is acceptable for data, but is totally disruptive to telephony.
Power
For nearly anyone, the periodic lack of power poses a serious concern. And, for
certain industries, such as healthcare, even the occasional lack of power is unacceptable.
In such industries, it is standard practice to provide battery and even generator backup
for PBX systems, which given that most telephones are powered from the PBX, implies that
the phone system can be kept up indefinitely in the event of power failures. In addition,
PBXs also support power fail transfer options that allow analog phones to grab analog
trunks under PBX failures. This serves to meet some 9-1-1 regulatory requirements.
IP phones and PCs are AC powered, and it is prohibitively expensive to provide backup
power across a distributed IP network. This economic weakness of IP telephony solutions is
being addressed by vendors and standards bodies. For example, Nortel Networks has shown
line-side powering of IP telephones on its Accelar switch.
Signaling And Data Planes
A PBX signaling and control system typically offers instantaneous dialtone
(except in most extreme cases). A PBX serving an entire building is virtually a
non-blocking device; many users can be talking to each other at one time. PBX and CO
trunks are engineered for a particular grade of service (GoS) expressed as a
low-call-blocking probability. Engineering is based on historical traffic patterns in
terms of communities of interest and call durations. Once a voice connection is set up, it
stays up (with very high probability). And, once either user hangs up, the voice call is
terminated.
In contrast, in a heavily loaded IP network, particularly if the call server is
far away, call setup signals, while protected from loss through TCP, may not
necessarily get across the network fast enough, effectively denying dialtone hence
the need to accord IP telephony signaling higher priority treatment. Conventional GoS
concepts apply at IP gateways to the PSTN. However, the GoS concept does not apply over
wide area IP trunks until some sort of connection admission control mechanisms are
developed. These could be based on pinging certain destinations and not routing calls over
the IP network if the latency and packet loss performance is not within certain bounds.
Information Transfer Plane
Another expectation for a telephony-grade infrastructure is that once a voice
connection is set up, voice quality is high and remains high for the duration of the call
(again with very high probability). In addition, transmission loss plans are used to
ensure consistency across a broad range of call types (for example, on-net versus off-net
calls). PBXs offer class of service (CoS) capabilities, but these refer to who can place
what types of calls (who may enjoy long-distance privileges, or what calls may receive
priority status), and which users have precedence over other users (for example, in
military applications).
There are two generic approaches available in IP networks. The first approach
over-engineers the bandwidth, ensuring that even under failure conditions there is more
than enough bandwidth available for all traffic, while ensuring that latency requirements
are met. In practical terms, over-engineering may be feasible only in campus networks.
Moreover, given the growth of traffic, overengineering is a short-term fix at
best. The second approach introduces IP quality of service (QoS) mechanisms such as those
based on the IETFs Differentiated Services (DiffServ) architecture, which allow
voice traffic (and signaling) to be handled on a priority basis.
IP QoS deals with priority handling when there is connectivity across the IP network.
However, it doesnt address connectivity loses attributable to excessive convergence
times, that is, to the time routers may consume while learning about the availability of
new routes, to compensate for failure conditions.
Convergence times may be substantial. For example, if state-of-the-art routing
protocols such as Open Shortest Path First (OSPF) are used, the convergence times may be
proportional to the square of the number of routers in the network, and can last minutes
in large networks. Many systems use the older Routing Internetworking Protocol (RIP),
which can result in loss of logical connectivity even when physical connectivity exists.
(During failures, the number of hops may exceed 15.) Hierarchical network design
techniques, switch redundancy, and multilink trunking can shorten routing convergence
times.
PRESCRIPTIONS FOR SUCCESS
No single solution suffices for the development of telephony-grade
infrastructures. Solutions will vary, just as user needs and economic and business
environments will vary. Furthermore, IP telephony applications can be implemented in a
number of ways on IP networks in terms of distribution of call server and gateway
functionality. However, the following general guidelines may be useful.
At The Switch Level
Optional power, interface, control, switching fabric redundancy, and hot
swappability are available in many products. With the convergence onto IP, hardware-based
routing switches are demonstrably more reliable than software-based multiprotocol routers,
and should be leveraged at the campus level. At the workgroup and remote office levels,
investments in redundancy should be justified against the business and end user need.
At The Network Level
Delayering campus networks that is, eliminating the campus
distribution and server aggregation tiers common in many networks (by using common
platforms at the workgroup and campus core tiers) results in fewer boxes and more
affordable redundancy. Likewise, in the WAN, a network architecture should be considered
that leverages bandwidth, virtual circuit, and Internet VPN services to deliver the
required throughput and latency even under failure conditions.
At Layer 1, SONET and DWDM features are available that may provide resilience. At Layer
2, mechanisms such as multilink trunking and multiprotocol label switching provide
automatic restoration without impacting routing systems. At Layer 3, resilience is
provided through dynamic routing protocols such as OSPF, complemented by Equal Cost
MultiPath (ECMP) routing. These networks leverage application awareness under a policy
management framework to provide comprehensive traffic management and QoS support, ensuring
that business-critical traffic has first access to network resources.
At The Application Level
IP telephony and signaling traffic should be tagged with the appropriate QoS bits
(via DiffServ, say) to ensure appropriate handling in the data network. In addition, it
may be possible to take advantage of a voice QoS feature that automatically switches to
the circuit network if IP congestion affects voice quality, and switches back to IP when
voice QoS is re-established on the WAN. (Nortel Networks has experience with such
functionality, which it calls IP VoiceFlex.) Switching between IP and circuit-switched
networks will be transparent to the user. This type of functionality ensures optimal use
of the IP network for high-quality voice communications.
At The Network Management Level
Managing IP networks is complex and difficult, particularly since skilled
operational personnel are hard to attract and retain. Adding telephony to the IP network
can exacerbate matters, multiplying demands in an already strained environment.
The solution lies in comprehensive network, policy, and service management. Performance
and fault management capabilities can significantly enhance network reliability. Remote
access to RMON, port mirroring, and remote traffic monitoring are key for effective remote
diagnostics. Policy and service level management are tools that not only support
telephony, but also business-critical applications such as engineering resource planning,
supply chain management, and e-commerce.
At The Operations Level
Formal procedures need to be developed and followed in several areas, certainly
in the deployment of new software releases in the network (that is, defining backout and
test plans, and maintenance windows). In addition, there are some procedures that are
already observed in IP networks, but which may be best obviated. For example, IP networks
have evolved from LANs and PCs, carrying with them procedures such as pressing the restart
button to restore stability; however, such procedures are unacceptable in telephony
environments. A culture of service level management needs to be developed, tracking
latency and trunk utilization performance and loss rates from an application perspective.
TELEPHONY-GRADE Networks NOT JUST FOR TELEPHONY
Enterprise users demand the freedom to choose the rate at which they evolve their
telephony communications environments. And yet, while the idea of leveraging the Internet
for more connectivity options and new applications may inspire enthusiasm or caution,
depending on your circumstances, it is widely recognized that IP networks are becoming
increasingly business-critical even for traditional data applications.
With the increased emphasis on building a full spectrum of e-business offerings
(including e-commerce, ERP, e-care), there is a growing requirement for the same level of
infrastructure reliability that is ultimately required for IP telephony. For example, in a
comprehensive supply chain management application environment, a customer query with a
response time requirement of three seconds may necessitate many back-office network
transactions (to the factory, to accounting, to inventory databases). Given the cumulative
nature of delays, individual transactions may have latency requirements below 100 msec (a
figure in the same order of magnitude as voice).
Enterprises need solid networking and management infrastructures that cost-effectively
support business-critical activities, and that provide for redundancy wherever it may be
needed. Certain applications include built-in end point intelligence to recovery
losslessly from network failures (for example, with TCP/IP) or to compensate for variable
network delays (for example, audio and video streaming). Other applications, like
telephony, have expectations established in a circuit-switched world. In fact, with
multiple applications running on a common IP network infrastructure, reliability should
be, in large part, viewed from the perspective of those applications that are business
critical. This is a key perspective change that drives the need for service level
management, that proactively tracks the performance of the network from the end user
application perspective.
The IT challenge is to lower per-unit networking costs and to provide the enterprise
with networking capabilities that enhance its competitiveness. Simplicity, reliability,
and price/performance are the key attributes of business-grade networking infrastructures.
Tony Rybczynski is director of strategic marketing and technologies for Nortel
Networks Enterprise Solutions unit. This business unit offers a full range of
enterprise terminal, workgroup, campus, and wide-area unified networks and applications,
through direct and indirect channels. For more information, visit the companys Web
site at www.nortelnetworks.com. E-mail questions or comments to tonyryb@nortelnetworks.com. |