×

SUBSCRIBE TO TMCnet
TMCnet - World's Largest Communications and Technology Community

CHANNEL BY TOPICS


QUICK LINKS




 

horizon.GIF (9417 bytes)
June 1999


High Availability - Open Systems Rise To The Challenge

BY BROUGH TURNER

Highly available, even fault tolerant, computer systems have become critical in many industries. But requirements differ markedly between industries. In the world of finance, transactions must reach predictable results once they are started - even if the power, communication line, or central computer fails. You want to be sure that if you request money from an Automatic Teller Machine (ATM), the money comes out of the machine before your bank account is debited. It is not desirable, but it is acceptable, for the transaction to take 1 minute instead of 20 seconds, or for an ATM at a particular location to be temporarily unavailable. But debiting your account when no money is received is unacceptable.

In telecommunications services, the first priority is availability of the service, not maintaining the consistency of transactions in progress. When you pick up the phone, you expect to hear a dialtone within a second, and to be able to place a call. If you are in the middle of a call, it is not desirable for the call to be dropped, but as long as you can pick up the phone and get dial tone again, it's an unfortunate glitch, but not a failure to deliver telecommunications service. Unlike financial transactions where the overriding consideration is to get the transaction correct, in the telecom world the overriding consideration is to provide the service.

These realities lead to some subtle differences in system architecture. But the underlying hardware and software components are substantially the same. In each market there is a history of special purpose hardware, whether the hardware involves computer systems from Tandem (now Compaq Computer) or Stratus, or computers embedded within central office (CO) switches from Lucent or Nortel.

The heart of the today's CO switch is a special purpose, highly available computing system, designed to localize failures, bring standby components on line quickly, and support 99.999 percent service availability. But now, just as the CO switch is being threatened by IP telephony, the basic approach to the computers that control telecommunications equipment is also under fire. Today, open telecommunications technology is providing a way to create a highly available services using off-the-shelf components.

SHARING THE LOAD
With the advent of PC-based computer telephony, it became possible to create highly reliable services - in telecommunications terms - using distributed solutions. For example, if the telecommunications traffic for a specific service is being carried by 50 separate PCs and one of them fails, 2 percent of the calls in progress at that instant will be dropped. But users will be able to immediately re-establish their calls because the remaining 98 percent of the system is still functioning. This scenario can be viewed as providing a "highly available" service as defined in the telecom industry. Indeed, even before the advent of CompactPCI, industrial PCs have made some inroads providing enhanced services in the public network.

This architecture can also be seen in action at many points in the Internet, for example, in the equipment used by a company like Yahoo. Yahoo uses many, many mass-market computing devices (PCs) to provide a very high capacity service that is constantly available. Telecom services are more challenging than Web hosting in part because telecom has to combine new-generation technology with legacy equipment, but the extra challenges of telecom are being addressed.

KEEPING USERS ON LINE
Let's begin by looking at the individual subscriber connections. In a traditional CO switch, there are wires from each subscriber that terminate at a line card. It is not economically feasible, or even rational, to provide duplicate (that is, redundant) line cards since they fail very infrequently. In traditional CO switches, services outages are minimized by limiting the number of users per line card, being able to detect failures and impending failures, and being able to hot-swap failed cards. This localizes and minimizes any service outage and ensures that users connected to the failed card are back on line very quickly.

With CompactPCI we can provide the same functionality in an open telecommunications platform. Subscriber lines can be terminated on line cards with built-in test facilities, and these line cards can be hot-swapped in the event of a failure. In a CompactPCI chassis, the line cards are connected together for telephony purposes by the CT Bus (H.110). The CT Bus has 32 separate serial paths and totally redundant clocking, providing highly available interconnection at the chassis level. In traditional telecommunications systems, the line card chassis (also known as a peripherals shelf) may or may not come with redundant "shelf controllers." This is a commercial trade-off between possible outages of a larger group of subscribers versus added cost. Similarly, with CompactPCI we can choose chassis with single or redundant CPUs.

DISTRIBUTING THE SWITCHING FABRIC
To create a next-generation CO capable of supporting hundreds of thousands of subscribers, you would need many CompactPCI chassis. Obviously it is necessary to interconnect these chassis in a way that is either redundant or highly distributed. One option is to use Asynchronous Transfer Mode (ATM) inter-chassis links, also known as MC4, a technology available today from vendors such as InnoMediaLogic (IML). IP-based approaches, running on gigabit Ethernet links, are also possible; however, for this application, IP is still an emerging technology. Whether ATM or IP, we can easily craft both redundant and distributed inter-chassis interconnections.

THE SOFTWARE TO CONTROL IT ALL
So far we've concentrated on the hardware. The biggest issue is to make the software in this distributed system as robust as the hardware. One approach would be to use open computing versions of traditional methods - for example, use a redundant UNIX server running an Oracle NonStop database to keep track of all the configuration, billing, and resource management issues. This may be the most direct approach, but lower cost, distributed solutions can achieve the same result.

Managing The Configuration Information
A major issue for large telecommunications systems is keeping track of configuration information - the line cards, the subscribers, the services each subscriber is entitled to, where the trunks are connected, etc. This is a "read-mostly" database; it is written to only when performing what is called "operations, administration, and maintenance" (OA&M) - functions such as adding new subscribers, changing trunks, and reconfiguring the system.

You don't really need a fault-tolerant transaction processing database system to track this information, as long as there are several replicas and relevant subsets are cached in each of the distributed processors in the system, with cache updates at user-defined intervals. Database replication is appearing in a variety of commercial software. And as long as each processor in the distributed switching system has a local copy of the necessary portion of the configuration information, the system will be able to function, even if the central database goes down.

Managing Billing Information
Another centralized service maintained by the CO is billing. Typically, a centralized database accumulates the actual records of all calls or other transactions that are billable. In a distributed system, this information is generated on the individual processors setting up the calls.

It is true that each processor is a single point of failure, but as long as it is functioning, it can be generating Call Detail Records (CDRs). And if individual processors broadcast their CDRs over an IP network - a redundant LAN if desired - then two or more machines can be configured to accumulate billing information (redundantly).  IP broadcast is well understood, as are redundant Ethernets to interconnect with the machines recording the call detail information.

The actual processing of the billing information then becomes a batch process. You may want to run this batch process frequently to provide hour-by-hour billing. But, if there is a delay of 15 minutes because one of the systems needs to be rebooted, raw information may be collected on another machine in real time, and subscribers may continue receiving service.

The worst that can happen is that one of the call processing boxes fails. In this case, the calls in process would be dropped and the users would not be billed - a good thing under the circumstances. The important thing is the total system still provides dialtone. That is the key issue for high availability in telecommunications.

Managing Shared Resources
The final hurdle in creating a highly available CO using distributed, mass-market computing devices is figuring out how to manage resources where control must be shared. It's one thing to have read access to a replica of a configuration database. It is another thing to share control of the inter-chassis switch fabric.

In a traditional CO, there is a centralized database that describes the second-by-second utilization of the switch fabric and determines routes for each new call through the system. One option in a distributed system is to use a highly reliable, and expensive, central database system to allocates virtual circuits - over the MC4 links - between the chassis. A more distributed approach would be to configure and partition these virtual circuits in advance and then allow each chassis to manage the traffic it is sending over its portion of the system.

The success of this distributed approach has been demonstrated. The MC1 multi-chassis MVIP systems (and the equivalent SCXbus) provide a switch fabric for interconnecting multiple PCs. Its capacity is limited to 1500 timeslots, and its copper cable is limited to 15 meters interconnecting a dozen or so chassis. However, its operation is instructive.

The transmit rights for specific paths within the cable are divided up and assigned to individual processors. This means that the resource allocation for conversation paths in the cable are distributed at configuration time. There is no central database involved in a call between chassis on the cable. For example, to connect a call between chassis 2 and chassis 7, you send messages (over redundant Ethernet, for example) to both chassis telling them to allocate transmit timeslots and report back. Each chassis allocates the timeslot it will transmit on using a free path from those it was pre-assigned. When you get back the two transmit assignments, you tell chassis 2 to listen on the timeslot that chassis 7 is transmitting on and visa versa. Each chassis has complete control of a subset of the switch fabric so the configuration database is completely distributed. This makes less efficient use of the switch fabric, but the database is dramatically simplified. With the cost of inter-chassis capacity dropping, using a completely distributed architecture can be economically justified.

BARRIERS ARE DROPPING
Software development is the gating factor for bringing new equipment and services to market. The richest software development environment in the world is on PCs connected to the Internet. And now, the high availability requirements of telecommunications systems can be handled by CompactPCI and distributed computing architectures. As a result, mass-market computing technology has become the most practical and cost-effective way of building new telecom services. New designs based on proprietary technology make little sense. Virtually any telecommunication service can be more economically built using mass-market computing technology on open platforms - while addressing all of the high availability issues.

Brough Turner is senior vice president of technology at Natural MicroSystems, a leading provider of hardware and software technologies for developers of high-value telecommunications solutions. For more information, call Natural MicroSystems at 508-620-9300 or visit the company's Web site at www.nmss.com. E-mail to the author ([email protected]) is also welcome.







Technology Marketing Corporation

2 Trap Falls Road Suite 106, Shelton, CT 06484 USA
Ph: +1-203-852-6800, 800-243-6002

General comments: [email protected].
Comments about this site: [email protected].

STAY CURRENT YOUR WAY

© 2026 Technology Marketing Corporation. All rights reserved | Privacy Policy