August 2003
Understanding The HA Stack
BY TONY ROMERO & SEAN O'BRIEN
With economic recovery still looming in the distance, more and more end
customers are demanding architectures that will help them lower the total
cost of ownership of their systems. As a result, equipment manufacturers
across the communications, industrial, military and aerospace sectors are
creating new, always-on, high availability (HA) architectures designed to
help reduce operational expenses and revenue loss due to service
interruptions. Until now, a calculation based on the quality and redundant
capability of all hardware components in a system would determine the availability
of a system, translated to a number of �nines.� This calculation equals the
percentage of time per year that a system remains functional. Most end users
demand at least 99.999 percent (five nines) availability. But now, with the
development of new, more reliable designs, availability is pushing far
beyond the traditional five nines and is being defined by an integrated
suite of redundant hardware components and specialized layers of software
components, commonly known as the �HA Stack.�
The HA Stack
Several layers make up the HA Stack. Comprised of both hardware and
software, each layer serves a distinct purpose. Some layers are
independently functional, while some are dependent on other layers in the
stack. For the purpose of this article, the ranking begins at the bottom,
where we find some of the most ubiquitous features for developing HA
solutions. The first layers of the HA Stack provide hardware features
including Hot Swap and Component Redundancy. The third layer, Management, is
comprised of both hardware and software. And the final three layers are
software-based, including Fault Tolerance, Redundant Host, and Predictive
Analysis/Policy-Based Management.
Hot Swap and Component Redundancy
The main function of an HA system is to increase Mean Time Between
Interruptions (MTBI) and decrease Mean Time To Repair (MTTR). MTBI is an
interruption of service to the system and is affected by the Mean Time
Between Failure (MTBF) as well as redundancy of the system components. MTTR
is defined by serviceability and redundancy. The serviceability requirement
says all components in the architecture must be hot swappable, while the
redundancy category ensures that there is no single point of failure. Common
redundant components in a platform include single board computers,
Intelligent Platform Management Interface (IPMI)-based management modules,
PICMG 2.16 fabric switches, power supplies, fan trays, and dual power
inputs.
System Management
A major milestone in the PICMG committee was the ratification of PICMG 2.9,
which defines a standardized management bus using the IPMI specification. A
system typically supports one or two redundant shelf management modules with
a single IP address to interrogate and control some or all of the system
components. Each IPMI-based managed component is designed with a
microcontroller running an independent, small footprint operating system. On
the shelf management module, the microcontroller performs queries to the
other managed objects and stores the information in an event log. On the
other components, the associated microcontroller can be much simpler as it
just reports information about itself.
IPMI functions as a separate management plane, independent of the main
application processors and operating system. Because the microcontroller
runs autonomously from its host, if the host�s operating system crashes, the
microcontroller can report the failure, and the dedicated shelf management
module in the platform can reset the board or power it down until service
personnel can diagnose the problem. Management architectures that support
user-defined thresholds allow the system manager to set early warning levels
that allow the system manager to react to a problem before it becomes
catastrophic.
Fault Tolerance
At the simplest level, fault tolerance means that the basic system devices
continue some level of operation after failure. Some redundant components in
the chassis operate on a simple load-sharing principal -- if one power
supply fails, the others will increase their share of the load until the
failed supply is replaced. However, at a more intelligent level every
operation is mirrored, meaning they are performed on two or more duplicate
systems, such as load sharing CPUs or redundant line cards operating in a
cluster. To achieve an elegant failover, the latest, most reliable data must
be synchronized between the two redundant components. Therefore,
fault-tolerant components must also synchronize important configuration
data, such as management information collected by the shelf management
modules.
Redundant Host
One of the most recent additions to the HA Stack is the Redundant Host
layer. Redundant Host is a standards-based design founded on the PICMG 2.12
Redundant Host API. The Redundant Host set of APIs comprehends CompactPCI
and CompactTCA and can even be leveraged into AdvancedTCA in the future.
Products built around the Redundant Host architecture can provide
single-digit millisecond control failover, allowing system applications to
recover almost immediately from any catastrophic control blade failure.
Failovers can be triggered by predictive failure analysis, allowing the
drivers and applications time to sync their databases and state information
before handing over the control.
There are three types of device interaction in systems design: system
data, management, and control. Information such as voice, images, and
Internet traffic is exchanged in the data plane. This is the information
that is being manipulated for the intended use of the system. The management
plane is used to get status information and set thresholds for reporting or
to take local actions regarding the functional health of the devices in the
system. The control plane performs such functions as initialization,
configuration (including hot swap), and control of the devices within the
system.
Redundant Host concentrates its efforts in the control plane. There is
typically a 1:N relationship of control blade to IO blades. In CompactPCI,
the system master typically performs the control blade role while the
peripherals perform the IO blade roles. This provides a single point of
failure in that architecture, which is overcome by the Redundant Host design
by providing management of a redundant control blade. This paradigm holds
true for some PICMG 2.16 (and even pure Ethernet) architectures where IO
devices manipulate the payload data and require a controlling device to
either provide routing for further processing, store the data, or simply
collect statistics.
Typical failovers for a pure Ethernet cluster can range in multiple
seconds. Processing data from a single Gigabit Ethernet interface for two
seconds requires a buffer of 256 Megabytes of RAM to eliminate data loss.
Compare this with a failover time of less than 10 milliseconds requiring a
relative buffer size of less than one Megabyte. Faster failover has an
impact of reducing system requirements and therefore the cost of a control
blade in any architecture.
In the case of a CompactPCI system requiring reliability and cost
effectiveness, this architecture is ideal. Consider a chassis with 12
peripheral IO blades to perform E1/T1 trunking and routing. These blades can
cost upward of $10,000 each. The single point of failure in the CompactPCI
system is the lone system master, which typically costs $2,500 or less. A
clustering solution would require two identical systems to be built at a
total cost of $245,000. By making the system master a redundant component,
the high cost of the line cards is optimized, yielding a total blade cost of
$125,000, almost half the cost of a clustering solution.
Redundant Host saves development time by managing the internal mechanics
of host failover. Developers need only design for synchronization of
databases and state information along with notification of becoming active.
By using a shared storage solution on the system masters, each ensuing
transaction from the IO blade can be committed to a database then
acknowledged to the IO blade followed by synchronization with the backup
host. In the case of a catastrophic failure to the active host, the backup
would be up and running within 10 milliseconds. The newly active host would
know of the last committed transaction, verify this with the database
accessed through shared storage, and request the IO blades to repeat the
missing transaction.
Predictive Analysis And Policy-Based Management
Finally, HA solutions can take advantage of sophisticated management
software that controls the entire application in real-time. Developers using
this level of technology will take advantage of the platform�s shelf
management module to set user-defined thresholds to predict problems before
they become catastrophic. For instance, if components and shelves are
overheating, the technology can trigger the air conditioning in the
operations center. Or if the service is becoming unstable, the technology
can force a higher level of data backup to ensure no loss of information. A
policy-based management system typically establishes a baseline and a set of
rules to define a specific action to take when a specific event or a
specific combination of events occurs. These rules or policies can be fine
tuned to establish a certain level of performance throughout the entire
system. This enterprise-level of view management, which monitors all aspects
of the operations, ensures that policies are being enacted intelligently and
globally, not based on one specific shelf or network module.
Conclusion
Developing high availability or always-on applications can be a
challenge. However, vendors of standards-based embedded system architectures
such as CompactPCI and PICMG 2.16 are offering products with these
capabilities built in. Since each level of the HA Stack is dependant on
other levels, it is important to look for integrated solutions that offer as
many of these features as possible; or at least the capabilities. Ultimately
this will allow the developer to offer high availability, reparability,
scalability, and a lower total cost of ownership for the end user.
Tony Romero is senior product manager, and Sean O�Brien, is software
engineering manager at Performance Technologies,
a developer of embedded computing products and system-level solutions for
equipment manufacturers and service providers worldwide. With competencies
in compute platforms, IP/Ethernet switching, communications software,
wide-area networking, SS7/IP interworking, and high availability,
Performance Technologies offers unified products for existing and emerging
applications.
[ Return
To The August 2003 Table Of Contents ]
|