SUBSCRIBE TO TMCnet
TMCnet - World's Largest Communications and Technology Community

CHANNEL BY TOPICS


QUICK LINKS




 

Compass.GIF (9402 bytes)
August 1999


Make It Run Forever

BY JEFF LAWRENCE

Products and services undergo three phases of development: they evolve from the "make it run" phase to the "make it run fast" phase, and finally reach the "make it run forever" phase. As telephony, wireless, and Internet services grow in complexity and economic value, the need for products and services to "run forever" in the network will continue to gain importance.

Today's network users expect a service to be available upon request. This expectation places a special burden on the service providers to ensure that all of the network elements needed to provide a service are functioning when a user requests that particular service. In other words, the availability of a service is truly ensured only when even the weakest link in the chain of equipment and transmission facilities is available to provide the service. Service availability depends on software, hardware, and network design as well as on environmental and operational factors.

Public telephony network service providers understand these factors clearly and have developed several approaches to ensure the availability of services from the public network. In contrast, some enterprise networks operate with lower levels of service availability.

Businesses using these enterprise networks accept these lower levels of service availability because they can justify avoiding the perceived and actual expense of ensuring high availability. However, as the industry matures, service providers and carriers will increasingly demand telecommunications products that are designed to minimize downtime and minimize the associated revenue losses. For equipment manufacturers, providing high availability will become an economic necessity.

RELIABILITY AND AVAILABILITY
To understand availability, we first need to understand reliability. The reliability of an element is the conditional probability that the element will operate during a specified period of time. The availability of an element is the probability that the element is in service and available to a user at any instant in time.

Systems may go out of service for any number of reasons, such as the occurrence of a fault, repair activities, software loading, hardware upgrading, or periodic maintenance. For a system to achieve high availability, the duration of these interruptions must be as short as possible. A system may be considered highly reliable (that is, it may fail very infrequently), but, if it is out of service for a significant period of time during a failure, it will not be considered highly available.

HIGH AVAILABILITY
The network can be described as a collection of elements, including equipment (such as switches and routers, gateways, and service platforms) and transmission facilities (such as copper, cable, fiberoptic, and wireless technologies). Any of these elements may fail because of incorrect design, environmental factors, physical defects, or incorrect usage (that is, operator error). Incorrect usage and component mortality are typically the most common causes of failure.

Network elements, such as telephone switching systems, typically operate with a target availability of 99.999 percent often referred to as “five nines availability.” This level of availability translates to the equivalent of a telephone switch being allowed to be out of service for only a few minutes per year. These few minutes per year include all of the time needed to repair faults, load software, upgrade hardware, and perform periodic maintenance and any other necessary activities.

Telephone switching systems today are designed so that active calls are not lost and only an insignificant number of calls in progress are mishandled during switching system failure. Five nines availability is the standard toward which the Internet infrastructure and services will need to strive if they are to fulfill the requirements of providing services seamlessly across both the Internet and the PSTN.

Service providers and communications equipment manufacturers face the challenge of deciding what level of availability is sufficient for each network element when it is operating alone and when it is operating in conjunction with other network elements to meet service requirements. Higher availability usually costs more, and it is often difficult to determine whether the potential economic benefit of high availability is worth the cost.

Some network elements may not be designed to support high availability because it is assumed that their failure will have minimal service disruption and little economic impact on the user or the service provider (for example, a single mobile handset). On the other hand, some network elements will have to be specially designed to support very high availability. A switch in the core of the network, through which tens of thousands of connections are flowing, cannot be allowed to fail. If such a switch were to fail, the real and potential lost revenue could be very significant.

High availability can be achieved using various design approaches that attempt to strike a balance between meeting availability objectives and minimizing complexity and cost.

HIGH AVAILABILITY NETWORK
In the context of a network, various design approaches are not only applied to the design of specific network elements but are also applied to the arrangement, interconnection, and communication between those network elements (which occur using various organizational principles, routing protocols, and communications protocols).

The transport of voice over a circuit-switched network is resilient to bit errors. If the conversation becomes too “noisy,” the network effectively relies on the listening party to ask the speaking party to repeat themselves.

However, in the case of data transmission, the error detection and correction mechanisms are more stringent and are designed to operate more reliably over transmission facilities. The reliable and error-free transport of data and signaling information over these protocols is critical to ensuring proper network operation. SS7, IP, ATM, and Frame Relay are all being used for these purposes. The transport portion of the SS7 protocols, for example, is connectionless and has specific features designed to ensure very low latency, very quick error detection, and low message loss. In comparison, the TCP, UDP, and IP portions of the Internet Protocols are also connectionless but offer levels of performance different than from those offered by the SS7 protocols.

Routing protocols may come into play if a transmission facility has very high bit error rates, or if a transmission facility is not available for some reason, or if a connected node has failed. Within the public telephony network, SS7 protocols have been specifically designed to allow the routing and rerouting of signaling messages across multiple links to the same destination without message loss. Routing between SS7 network elements, such as Service Switching Points, Signaling Transfer Points, and Service Control Points, is based on element identifiers known as point codes. Point codes are not hierarchical in structure, although the SS7 network elements are frequently deployed as “mated pairs.” Mated pairs are redundant nodes that are fully interconnected with each other and the network. If one fails, then the other can easily continue performing the functions of the failed node until it is repaired.

Routing in the Internet follows a different approach. Internet network elements typically consist of routers and servers that are identified by IP addresses. The concept of mated pairs is not generally used within the Internet, and, in fact, routers are typically not organized in any hierarchy. If a transmission facility or network fails, messages may be lost. It is then up to the end user application to ensure that the routing and rerouting of messages occurs. The probability of ensuring the successful rerouting of messages increases as the diversity of the paths connecting the same two endpoints increases.

The future integration of the SS7 signaling protocols with the Internet protocols to provide unified signaling will generate a number of technical challenges, since ways must be found for the Internet protocols to provide the same level of service as the transport portion of the SS7 protocols.

HIGH AVAILABILITY EQUIPMENT
There are several design approaches to ensure high availability for individual network elements. The simplest type of network element is non-redundant and must be repaired off-line if it fails. This type of element will have relatively low design complexity and low cost. (Depending on its reliability, it may also have low availability.) In contrast, high availability elements require both the ability to support on-line repair (usually through the “hot swap” of components while the element is in service) and additional redundancy.

Elements with additional redundancy typically use “retry” and “masking” for recovery. Retry-based elements attempt to ensure that there is a second attempt at the operation if an initial operation fails. If the second attempt succeeds, the fault was probably transient. If the second attempt fails, the fault is probably permanent. Masking-based elements attempt to ensure that only the results from the correctly operating portion of the element are used if a component fails. In either case, if a component has failed, an attempt to diagnose, confine, and compensate for the fault is undertaken.

Hardware fault tolerance typically relies on redundant processors, memory, buses, power supplies, and disk storage. Software fault tolerance uses a combination of software redundancy and simple hardware redundancy to provide the necessary availability in the case of failure. Depending on the approach that is chosen, one or more of the redundant components may be operating simultaneously. Hardware fault-tolerant approaches can typically support higher levels of performance than software fault-tolerant approaches. Using hardware fault-tolerance, the need for complex circuits is eliminated — significantly decreasing design complexity and cost.

CONCLUSION
High-availability products and services will play an increasingly important role in the network of the future. The availability of services is a complicated function of the equipment design and also of the network design. Different design choices will result in different levels of complexity, cost, performance, potential information loss, and (of course) availability. The reliability and availability of the public telephone network has set the standard against which future services from an integrated Internet and PSTN infrastructure will be measured.

Jeff Lawrence is president and CEO of Trillium Digital Systems, a leading provider of communications software solutions for computer and communications equipment manufacturers. Trillium develops, licenses, and supports standards-based communications software solutions for SS7, ATM, ISDN, frame relay, V5, IP, and X.25/X.75 technologies. For more information, visit the company’s Web site at www.trillium.com.







Technology Marketing Corporation

35 Nutmeg Drive Suite 340, Trumbull, Connecticut 06611 USA
Ph: 800-243-6002, 203-852-6800
Fx: 203-866-3326

General comments: tmc@tmcnet.com.
Comments about this site: webmaster@tmcnet.com.

STAY CURRENT YOUR WAY

© 2017 Technology Marketing Corporation. All rights reserved | Privacy Policy