ITEXPO begins in:   New Coverage :  Asterisk  |  Fax Software  |  SIP Phones  |  Small Cells

Feature Article
August 2003

Understanding The HA Stack


With economic recovery still looming in the distance, more and more end customers are demanding architectures that will help them lower the total cost of ownership of their systems. As a result, equipment manufacturers across the communications, industrial, military and aerospace sectors are creating new, always-on, high availability (HA) architectures designed to help reduce operational expenses and revenue loss due to service interruptions. Until now, a calculation based on the quality and redundant capability of all hardware components in a system would determine the availability of a system, translated to a number of �nines.� This calculation equals the percentage of time per year that a system remains functional. Most end users demand at least 99.999 percent (five nines) availability. But now, with the development of new, more reliable designs, availability is pushing far beyond the traditional five nines and is being defined by an integrated suite of redundant hardware components and specialized layers of software components, commonly known as the �HA Stack.�

The HA Stack
Several layers make up the HA Stack. Comprised of both hardware and software, each layer serves a distinct purpose. Some layers are independently functional, while some are dependent on other layers in the stack. For the purpose of this article, the ranking begins at the bottom, where we find some of the most ubiquitous features for developing HA solutions. The first layers of the HA Stack provide hardware features including Hot Swap and Component Redundancy. The third layer, Management, is comprised of both hardware and software. And the final three layers are software-based, including Fault Tolerance, Redundant Host, and Predictive Analysis/Policy-Based Management.

Hot Swap and Component Redundancy
The main function of an HA system is to increase Mean Time Between Interruptions (MTBI) and decrease Mean Time To Repair (MTTR). MTBI is an interruption of service to the system and is affected by the Mean Time Between Failure (MTBF) as well as redundancy of the system components. MTTR is defined by serviceability and redundancy. The serviceability requirement says all components in the architecture must be hot swappable, while the redundancy category ensures that there is no single point of failure. Common redundant components in a platform include single board computers, Intelligent Platform Management Interface (IPMI)-based management modules, PICMG 2.16 fabric switches, power supplies, fan trays, and dual power inputs.

System Management
A major milestone in the PICMG committee was the ratification of PICMG 2.9, which defines a standardized management bus using the IPMI specification. A system typically supports one or two redundant shelf management modules with a single IP address to interrogate and control some or all of the system components. Each IPMI-based managed component is designed with a microcontroller running an independent, small footprint operating system. On the shelf management module, the microcontroller performs queries to the other managed objects and stores the information in an event log. On the other components, the associated microcontroller can be much simpler as it just reports information about itself.

IPMI functions as a separate management plane, independent of the main application processors and operating system. Because the microcontroller runs autonomously from its host, if the host�s operating system crashes, the microcontroller can report the failure, and the dedicated shelf management module in the platform can reset the board or power it down until service personnel can diagnose the problem. Management architectures that support user-defined thresholds allow the system manager to set early warning levels that allow the system manager to react to a problem before it becomes catastrophic.

Fault Tolerance
At the simplest level, fault tolerance means that the basic system devices continue some level of operation after failure. Some redundant components in the chassis operate on a simple load-sharing principal -- if one power supply fails, the others will increase their share of the load until the failed supply is replaced. However, at a more intelligent level every operation is mirrored, meaning they are performed on two or more duplicate systems, such as load sharing CPUs or redundant line cards operating in a cluster. To achieve an elegant failover, the latest, most reliable data must be synchronized between the two redundant components. Therefore, fault-tolerant components must also synchronize important configuration data, such as management information collected by the shelf management modules.

Redundant Host
One of the most recent additions to the HA Stack is the Redundant Host layer. Redundant Host is a standards-based design founded on the PICMG 2.12 Redundant Host API. The Redundant Host set of APIs comprehends CompactPCI and CompactTCA and can even be leveraged into AdvancedTCA in the future. Products built around the Redundant Host architecture can provide single-digit millisecond control failover, allowing system applications to recover almost immediately from any catastrophic control blade failure. Failovers can be triggered by predictive failure analysis, allowing the drivers and applications time to sync their databases and state information before handing over the control.

There are three types of device interaction in systems design: system data, management, and control. Information such as voice, images, and Internet traffic is exchanged in the data plane. This is the information that is being manipulated for the intended use of the system. The management plane is used to get status information and set thresholds for reporting or to take local actions regarding the functional health of the devices in the system. The control plane performs such functions as initialization, configuration (including hot swap), and control of the devices within the system.

Redundant Host concentrates its efforts in the control plane. There is typically a 1:N relationship of control blade to IO blades. In CompactPCI, the system master typically performs the control blade role while the peripherals perform the IO blade roles. This provides a single point of failure in that architecture, which is overcome by the Redundant Host design by providing management of a redundant control blade. This paradigm holds true for some PICMG 2.16 (and even pure Ethernet) architectures where IO devices manipulate the payload data and require a controlling device to either provide routing for further processing, store the data, or simply collect statistics.

Typical failovers for a pure Ethernet cluster can range in multiple seconds. Processing data from a single Gigabit Ethernet interface for two seconds requires a buffer of 256 Megabytes of RAM to eliminate data loss. Compare this with a failover time of less than 10 milliseconds requiring a relative buffer size of less than one Megabyte. Faster failover has an impact of reducing system requirements and therefore the cost of a control blade in any architecture.

In the case of a CompactPCI system requiring reliability and cost effectiveness, this architecture is ideal. Consider a chassis with 12 peripheral IO blades to perform E1/T1 trunking and routing. These blades can cost upward of $10,000 each. The single point of failure in the CompactPCI system is the lone system master, which typically costs $2,500 or less. A clustering solution would require two identical systems to be built at a total cost of $245,000. By making the system master a redundant component, the high cost of the line cards is optimized, yielding a total blade cost of $125,000, almost half the cost of a clustering solution.

Redundant Host saves development time by managing the internal mechanics of host failover. Developers need only design for synchronization of databases and state information along with notification of becoming active. By using a shared storage solution on the system masters, each ensuing transaction from the IO blade can be committed to a database then acknowledged to the IO blade followed by synchronization with the backup host. In the case of a catastrophic failure to the active host, the backup would be up and running within 10 milliseconds. The newly active host would know of the last committed transaction, verify this with the database accessed through shared storage, and request the IO blades to repeat the missing transaction.

Predictive Analysis And Policy-Based Management
Finally, HA solutions can take advantage of sophisticated management software that controls the entire application in real-time. Developers using this level of technology will take advantage of the platform�s shelf management module to set user-defined thresholds to predict problems before they become catastrophic. For instance, if components and shelves are overheating, the technology can trigger the air conditioning in the operations center. Or if the service is becoming unstable, the technology can force a higher level of data backup to ensure no loss of information. A policy-based management system typically establishes a baseline and a set of rules to define a specific action to take when a specific event or a specific combination of events occurs. These rules or policies can be fine tuned to establish a certain level of performance throughout the entire system. This enterprise-level of view management, which monitors all aspects of the operations, ensures that policies are being enacted intelligently and globally, not based on one specific shelf or network module.

Developing high availability or always-on applications can be a challenge. However, vendors of standards-based embedded system architectures such as CompactPCI and PICMG 2.16 are offering products with these capabilities built in. Since each level of the HA Stack is dependant on other levels, it is important to look for integrated solutions that offer as many of these features as possible; or at least the capabilities. Ultimately this will allow the developer to offer high availability, reparability, scalability, and a lower total cost of ownership for the end user.

Tony Romero is senior product manager, and Sean O�Brien, is software engineering manager at Performance Technologies, a developer of embedded computing products and system-level solutions for equipment manufacturers and service providers worldwide. With competencies in compute platforms, IP/Ethernet switching, communications software, wide-area networking, SS7/IP interworking, and high availability, Performance Technologies offers unified products for existing and emerging applications.

[ Return To The August 2003 Table Of Contents ]

Today @ TMC
Upcoming Events
ITEXPO West 2012
October 2- 5, 2012
The Austin Convention Center
Austin, Texas
The World's Premier Managed Services and Cloud Computing Event
Click for Dates and Locations
Mobility Tech Conference & Expo
October 3- 5, 2012
The Austin Convention Center
Austin, Texas
Cloud Communications Summit
October 3- 5, 2012
The Austin Convention Center
Austin, Texas