Redundancy Key To High Availability Voice Services
By David Weiss, Dataprobe
It’s no secret that organizations today rely on a constant stream of voice and data communications to maintain contact with customers, staff and business partners. It’s also clear that system architects must provide for the highest degree of fault tolerance in maintaining critical communications and interaction. While many companies have embraced redundancy in the data center, few have recognized the need to incorporate redundant systems into their voice technologies.
When determining the necessary degree of resilience in a voice system, communication planners need to consider the importance of maintaining these vital links to customers. Ultimately, customer service becomes the issue as every interaction is measured by the customer. If your goal is 100 percent customer satisfaction, then you must deliver optimum results that exceed expectations.
Current server-based PBXs, VoIP solutions, conference bridges and call center systems provide new capabilities at a fraction of their previous costs. In deploying these technologies, organizations have come to recognize that these capabilities are mission-critical for operations. Planning for the inevitable failure of complex systems is the only way to prevent these events from becoming real disasters. Redundancy solutions offer fail-safe options on every level, encompassing failover servers, diverse phone lines, media storage and hot sites. With redundancy becoming the new watchword, downtime is no longer an acceptable risk.
Historically, the resiliency of the voice network meant that communication managers could focus their disaster planning budgets in other areas of concern. However, as the migration of voice communications moves toward server-based facilities and IP networks, planners are beginning to take extra steps to ensure that the expected level of uptime is maintained. In addition, as PBXs, call centers, conference bridges and VoIP technologies are built more frequently from a combination of hardware and software vendors, the likelihood of unscheduled downtime increases.
Failures can occur at any level of the system. While handsets have typically been the most reliable part of the phone system, new technologies have added additional levels of complexity. IP phones, computer-telephony integration and Bluetooth are all standards in this new paradigm. Server-based PBXs rely on complex software, often from multiple vendors, and disc arrays and other hardware technologies are subject to failures. Network connectivity opens the system to malicious attacks both internally and externally. Phone lines are subject to occasional outages due to cable breaks and component failures at the local and long-distance carrier levels.
Redundancy Solutions For The Line Side
Architecting a highly available solution means looking for ways to eliminate single points of failure in all aspects of the system design. Network planners should look at both link redundancy and hardware redundancy to minimize failure options.
Carriers can also provide diversity and avoidance to help minimize risks. Diversity refers to redundant services, and avoidance ensures that redundant services do not share common facilities. This can be done at both the local and long-distance levels. Additionally, loop diversity provides two redundant circuits from the local point-of-presence (POP) to your facility. POP diversity, having local links originate from multiple wire centers, or POPs, is an ideal solution. Interoffice diversity provides the same level of service between wire centers. Check with your carriers for available services. Using multiple carriers and providing multiple building entry points and different in-house routing are essential.
These diversity services may or may not include the customer premise equipment necessary to switch between the redundant links. Protection switching for redundant T-1 or DS-3 circuits is either provided by the carrier or purchased and installed by the customer. This provides automatic detection of degraded or interrupted service and switchover to a spare circuit. Protection switching can either be 1:1, with a standby circuit for each primary, or 1:N, with one spare circuit that can be substituted for one of several circuits.
Even in VoIP solutions, any gateway to the public switched network involves local loops and carrier services. It’s essential that companies do not overlook this critical link to customers. Diversity and avoidance services, along with protection switching, can provide solutions to help you maintain your voice communication systems.
Redundancy Solutions For The Equipment Side
The advantages of migrating to an open architecture, server-based platform for telephony services has prompted the wide acceptability of un-PBXs and IP-PBXs. Flexible architecture, standardized components, multiple sources and lower costs are only a few of the reasons that servers are now common for phone systems, voice mail, call recorders and other voice technologies.
There are, however, trade-offs for all this new power and flexibility. Voice systems have become more complex. Not so long ago, there was one provider to handle everything, including line and equipment servicing. Then came the age of the PBX, in which one provider solved all equipment needs and problems. Now systems are crafted together from “best of breed” providers for processors, memory, storage, power supplies, telephony boards, operating systems and application software. Deploying all of these technologies to work together is a testament to the collaborative nature of modern computers. It also increases responsibility for the end user or consultant to make sure that all the pieces work together seamlessly. Now that the phone system is on the network, it is susceptible to all of the maladies of the network world, including hacker attacks, viruses and Trojan horses. Unfortunately, it is therefore inevitable that elements of the system will either break down or be compromised.
While many manufacturers boast of the high reliability of their systems, the end user should be concerned with a different metric — availability. System availability is the readiness of the system to perform its stated function at any time of the year, month, day, hour or minute. Availability is calculated by using two standard measurements: mean time between failures (MTBF), which is the likelihood that the component will experience a failure; and mean time to repair (MTTR), which is the average length of time it takes to identify, diagnose, remove and repair or replace a failed component. If a component takes several hours to service and begin functioning again, the availability is severely compromised. The standard formula for calculating availability is as follows:
Availability = MTBF/(MTBF + MTTR)
This calculation is referred to as “inherent availability,” which does not take into account scheduled maintenance downtime. When lost time for maintenance is added, you can arrive at the true operational availability.
System engineers must look at maximizing reliability and minimizing both scheduled and unscheduled downtime to achieve the best availability possible: the operational availability. Redundancy is the key to providing both the maximum reliability and the minimum repair time. At the component level, redundancy is commonplace for the components that are most likely to fail. Mechanical items like disk drives and power supplies are often the first components that come to mind. Additionally, most systems for telephony services come standard with redundant arrays of inexpensive discs (RAID) and redundant power supplies.
More demanding applications require that redundancy at the system level be considered. In this scenario, two complete, identical systems are installed. One system provides hot standby with automatic failover for the other system. The standby server, SNMP manager or other management facility is continuously monitoring the health of the system in use. Upon detection of a failure of the primary system, the hot standby is switched into service. Physical layer switching is used to move the phone lines or operator stations from the failed primary server to the newly activated standby.
Hot standby systems allow for scheduled maintenance to be performed with the absolute minimum of interruption, as one server is upgraded while the other is in service. This solution also allows for an immediate, graceful fallback in the event that a planned upgrade goes awry or that a new software installation has unintentional and unplanned consequences.
Redundancy switches operate on the physical layer, moving the actual wires from the phone lines and operator instruments to the telephony boards in the system. In IP phone systems, switching is only required for the phone line side. In essence, redundancy switches perform the same function as a patch panel, but do so automatically and simultaneously for all circuits. As the central component of a fault tolerant solution, the redundancy switch itself cannot represent a single point of failure. These switches use mechanical latching relays to provide a continuous mechanical connection in all circumstances.
Although redundancy servers are optimal solutions, when considering their deployment, several issues still need to be addressed. As either server may be needed at any time, it is important their databases be synchronized so all configurations, securities and call logs are the same on both systems. Licensing is also a factor. If redundant systems share the same licenses, they need to have dongles switched or need an add-on module to support automatic redundancy switchover.
By considering your needs for availability and redundancy solutions, each system can achieve the level of fault tolerance required to meet organizational objectives. Voice technology remains a critical component to customer service and corporate communications. Thus, deploying line solutions, from diversity and avoidance services to protection switching, should be extremely important for network planners. Furthermore, phone systems have increased in complexity and networkability, leading to possible component failures and malware (malicious software) attacks. Standby systems and redundancy switches keep inevitable problems from becoming true disasters.
David Weiss has over 19 years’ experience in product management, business development, sales and marketing, and he is an expert in the remote site management technology industry. He serves as the president of Dataprobe, where he is directly responsible for developing new market business strategies and for establishing key partnership opportunities and strategic alliances. Dataprobe is a manufacturer of technology solutions for today’s demanding remote site management and networking needs. Since 1969, Dataprobe has been providing communication managers, OEM developers and direct consumers with remote technology products.
To January 2005 Table Of Contents ]