Business Continuity Featured Article
Bolstering Business Resilience: Don't Bring Me Down
A disk failure at Virgin Blue Airlines recently caused a 21-hour system outage that caused the cancellation of more than 100 flights, affected over 100,000 passengers and cost the company $15 to $20 million. This past summer, software-monitoring tools detected "instability" within DBS Bank's storage system even though there was not an actual problem; and the administrator-initiated recovery process took the company’s systems offline for seven hours, affecting all commercial and consumer banking systems.
North American businesses are collectively losing $26.5 billion in revenue each year as a result of slow recovery from IT system downtime according to a recent study. The study also indicates that the average respondent suffers 10 hours of IT downtime a year and an additional 7.5 hours of compromised operation because of the time it takes to recover lost data. Other studies estimate that the average cost of a single hour of downtime is approximately $100,000.
To hedge against unexpected outages, IT organizations attempt to prepare by creating redundant backup systems, duplicating every layer in their existing infrastructure and preparing elaborate disaster recovery processes. This approach is expensive and only partly effective, as demonstrated by the string of notable outages, and can be seen, at best, as a way to minimize downtime.
Major social networking companies, such as Google (News - Alert) and Facebook, have figured out how to scale out application stacks rather than scale up vertically. This results in operational advantages including improved response time and built-in redundancy. Unfortunately, it comes at the cost of a significantly more complicated development model and increased development cost structure. Ideally, enterprise software could achieve similar advantages without those operational costs.
Adding More Complexity to a Complex Challenge
The complexity of even small business networks today dwarfs those of large enterprises 15 years ago. While replication, server virtualization, virtual machine migration, SAN arrays, converged networks and other relatively new technologies provide benefits, implementing them comes with significant costs that many organizations overlook. Complexity makes implementation errors and system failures even more likely.
Ironically, the message is that these enterprise systems are so complex they are likely to fail; yet to prevent that, you need to add even more complexity!
Leveraging the Agility of Virtualization and Cloud Computing
As organizations look at ways to leverage the economics and efficiencies of virtualization and cloud computing, it is becoming painfully clear that the traditional approaches to infrastructure that underlie most of today’s cloud offerings do not effectively enable the potential agility of these new models.
Today, organizations are wrestling with ways to take advantage of cloud economics while maintaining control of their data and providing improved support for remote users. Now is the time for technology that enables options for deploying on-premise, in the cloud or a combination of both.
This is the next phase in truly enabling IT organizations to deliver applications with consistently high availability and performance to global and mobile workers, while maintaining an elastic and robust infrastructure within the constraints of tight budgets.
Simplifying IT Infrastructure
Emerging technologies that fundamentally decentralize applications and data greatly improve business resilience and simplify disaster and network recovery. They are designed to handle less-than-perfect performance from all components of the infrastructure.
New emerging approaches to scalable application computing simplify IT infrastructure by combining the various required elements – including storage, load balancing, database and caching – into easily managed appliances or cloud instances. Unlike conventional infrastructures where scale, redundancy, and performance are increased by "scaling up" and adding additional tiers of components, this provides an architecture where additional capabilities are added by "scaling out" and adding additional, identical nodes.
These systems automatically store data across the nodes based on policy, usage and geography, and intelligently deliver information when and where it is needed. All information is replicated across multiple nodes to ensure availability. If a node fails, users are re-routed to other nodes with access to their data so that productivity does not suffer. When the original node recovers, it resumes participating in the flow of data and applications and local users are reconnected to it. The system automatically synchronizes data in the background so no data is lost and performance is not compromised.
High Availability and Business Continuity
Organizations make significant investments in order to achieve high availability and business continuity, and every time a new application is deployed, these expenses increase as the redundant infrastructure is scaled up. Because of the intrinsic complexity in current application deployments, attempts at redundancy are often ineffective and application availability suffers.
What’s now required is an application infrastructure that inherently provides high availability without the additional dedicated infrastructure needed with 2n or 3n redundancy. If a site became unreachable due to an outage, geographic redundancy would preserve the availability of applications and data.
Supporting the Global Organization
Despite business globalization, with customers, partners and employees more likely than ever to be located around the world, in recent years there’s been a drive to consolidate data centers. The underlying assumption is that consolidated data centers will allow information technology organizations to better control resource costs for space, energy, IT assets and manpower. With the stampede to consolidation, valid concerns about availability and performance for users in remote locations are sometimes overlooked. Unfortunately, the consolidation cost savings aren’t always as dramatic as anticipated and new problems are often introduced as a result.
Substantial problems remain with maintaining availability and performance for remote workers. Additionally, high-speed WAN links used in attempts to address these problems can be prohibitively expensive, particularly outside North America.
If all the required application infrastructure components resided on comprehensive nodes, the nodes could be placed in small and remote locations. Since virtually all of the supporting infrastructure for an application would be included in a node, performance and responsiveness would improve at each site.
Ongoing support costs would also be reduced because scaling an application in this way is much easier than with traditional deployments. If a site is growing and needs greater scale, a node can be easily added at that site. This approach only makes sense if no additional IT staff is required at the remote sites. For instance, the addition of a node should be easy enough that non-IT staff can do it.
Improving Performance and Availability
Organizations today are more geographically dispersed than ever and many IT organizations have dedicated significant resources to ensure adequate response time performance for their remote offices around the globe. These organizations have usually invested heavily in infrastructure; such as WAN optimization, federated applications and high speed network connections. Today’s typical application infrastructure requires a variety of components – a pair of hardware load balancers, application servers, database servers as well as storage for their data. Moreover, to attain redundancy, much of this infrastructure needs to be duplicated off-site.
The complexity of this type of infrastructure requires continual investment simply to maintain the systems and components. Yet poor performance and spotty availability are often a reality for those working in remote offices.
Taking a new approach to application deployment can result in significantly lower costs. Using inexpensive, identical nodes at each site, and eliminating the need for a separate failover site could dramatically reduce initial capital expense. Another factor contributing to lower costs is the simpler, fully integrated stack, which makes applications much easier to deploy, manage and troubleshoot.
The future of enterprise computing requires truly distributed computing that enables remote workers to be highly productive. Simplified, smarter application platforms that integrate disparate technologies such as data storage, database, application servers and load balancing will surpass existing solutions in cost, manageability and reliability.
Fundamental architecture changes and technologies are emerging that are resilient, and are enabling IT professionals to provide solid infrastructures, eliminate downtime and deliver applications with consistently high availability for global and mobile workers.
About the Author
Frank Huerta is CEO and co-founder of Translattice, where he is responsible for the vision and strategic direction of the company. He started his career in engineering and product development at Hughes Aircraft Company, Santa Barbara Research Center. He then served in product management roles at Seagate (News - Alert) Software as well as VeriFone, Inc. Huerta was the director of business development for Exodus Communications where he focused on mergers and acquisitions. He was a co-founder and the CEO of Recourse Technologies, where he raised nearly $40M from leading venture capitalists and corporations. Recourse was purchased by Symantec (News - Alert) Corporation for $135M in cash, and Huerta then served as a vice president at Symantec for over a year. Most recently, Huerta was the CEO and co-founder of Cartilix, Inc., a medical device product development company. Huerta serves on the board of the Arthritis Foundation of Northern California and Cate School Board of Trustees. He has a master’s degree from the Stanford Graduate School of Business and an undergraduate degree in physics from Harvard University cum laude.
TMCnet publishes expert commentary on various telecommunications, IT, call center, CRM and other technology-related topics. Are you an expert in one of these fields, and interested in having your perspective published on a site that gets several million unique visitors each month? Get in touch.
Edited by Tammy Wolf