Tuesday, July 31, 2012

Disaster Recovery Virtualization Using the Cloud

Share

                      Disaster recovery is a necessary component in any organizations’ plans. Business data must be backed up, and key processes like billing, payroll and procurement need to continue even if an organization’s data center is disabled due to a disaster. Over time, two distinct approaches to disaster recovery models have emerged: dedicated and shared models. While effective, these approaches often forced organizations to choose between cost and speed.

                      We live in a Global economy that is balanced around and driven by a '24x7' culture. Nobody likes to think about it but in order to thrive and survive Disaster Recovery is a necessary component in any organizations’ plans. Even with a flat IT budget, you need to have seamless failover and failback of critical business applications. The flow of information never stops and commerce in our global business environment never sleeps. With the demands of an around-the-clock world, organizations need to start thinking in terms of application continuity rather than infrequent disasters, and disaster recovery service providers need to enable more seamless, nearly instanta-neous failover and failback of critical business applications. Yet given the reality that most IT budgets are flat or even reduced, these services must be provided without incurring significant upfront or ongoing expenditures. Cloud-based business resilience can provide an attractive alter-native to traditional disaster recovery, offering both the more-rapid recovery time associated with a dedicated infrastructure and the reduced costs that are consistent with a shared recovery model. With pay-as-you-go pricing and the ability to scale up as conditions change, cloud computing can help organizations meet the expectations of today’s frenetic, fast paced environment where IT demands continue to increase but budgets do not. This white paper discusses traditional approaches to disaster recovery and describes how organizations can use cloud computing to help plan for both the mundane interruptions to service—cut power lines, server hardware failures and security breaches—as well as more-infrequent disasters. The paper provides key considerations when planning for the transition to cloud-based business resilience and in selecting your cloud partner.



A Qualitative Trade Off Between Cost & Speed 


            When choosing a disaster recovery approach, organizations have traditionally relied on the level of service required, as measured by two recovery objectives:

●● Recovery time objective (RTO)—the amount of time between an outage and the restoration of operations 
●● Recovery point objective (RPO)—the point in time where data is restored and reflects the amount of data that will be ultimately lost during the recovery process. 

In most traditional disaster recovery models—that are usually dedicated and shared— organizations are forced to make the tradeoff between cost and speed to recovery.


          In a dedicated model, the infrastructure is dedicated to a single organization. This type of disaster recovery can offer a faster time to recovery compared to other traditional models because the IT infrastructure is mirrored at the disaster recovery site and is ready to be called upon in the event of a disaster. While this model can reduce RTO because the hardware and software are pre-configured, it does not eliminate all delays. The process is still dependent on receiving a current data image, which involves transporting physical tapes and a data restoration process. This approach is also costly because the hardware sits idle when not being used for disaster recovery. Some organizations use the backup infrastructure for development and test to mitigate the cost, but that introduces additional risk into the equation. Finally, the data restoration process adds variability into the process.

In a shared disaster recovery model, the infrastructure is shared among multiple organizations. Shared disaster recovery is designed to be more cost effective, since the off-site backup infrastructure is shared between multiple organizations. After a disaster is declared, the hardware, operating system and application software at the disaster site must be configured from the ground up to match the IT site that has declared a disaster, and this process can take hours or even days.


Measuring level of service required by RPO and RTO 




 Traditional disaster recovery approaches include shared and dedicated models 


The pressure for continuous availability 


According to a CIO study, organizations are being challenged to keep up with the growing demands on their IT departments while keeping their operations up and running and making them as efficient as possible. Their users and customers are becoming more sophisticated users of technology. Research shows that usage of Internet-connected devices is growing about 42 percent annually, giving clients and employees the ability to quickly access huge amounts of storage. In spite of the pressure to do more, they are spending a large percentage of their funds to maintain the infrastructure that they have today. They are also not getting many significant budget increases; budgets are essentially flat.1 With dedicated and shared disaster recovery models, organiza-tions have traditionally been forced to make tradeoffs between cost and speed. As the pressure to achieve continuous availability and reduce costs continues to increase, organizations can no longer accept tradeoffs. While disaster recovery was originally intended for critical batch “back-office” processes, many organi-zations are now dependent on real-time applications and their online presence as the primary interface to their customers. Any downtime reflects directly on their brand image and interrup-tion of key applications such as e-commerce, online banking and customer self service is viewed as unacceptable by customers. The cost of a minute of downtime may be thousands of dollars.


Thinking in terms of interruptions and not disasters 


Traditional disaster recovery methods also rely on “declaring a disaster” in order to leverage the backup infrastructure during events such as hurricanes, tsunamis, floods or fires. However, most application availability interruptions are due to more mundane everyday occurrences. While organizations need to plan for the worst, they also must plan for the more likely—cut power lines, server hardware failures and security breaches. While weather is the root cause of just over half of the disasters declared, note that almost 50 percent of the declarations are due to other causes. These statistics are from clients who actually declared a disaster. Think about all of the interruptions where a disaster was not declared. In an around-the-clock world, organizations must move beyond disaster recovery and think in terms of application continuity. You must plan for the recovery of critical business applications rather than infrequent, momentous disasters, and build resiliency plans accordingly.



 Time to recovery using a dedicated infrastructure 



Time to recovery using a shared infrastructure. The data restoration process must be completed as shown, resulting in an average of 48 to 72 hours to recovery. 




Types of Potential business interruptions



Cloud-based Business Resilience is a Welcome New Approach 


Cloud computing offers an attractive alternative to traditional disaster recovery. “The Cloud” is inherently a shared infrastruc-ture: a pooled set of resources with the infrastructure cost dis-tributed across everyone who contracts for the cloud service. This shared nature makes cloud an ideal model for disaster recovery. Even when we broaden the definition of disaster recovery to include more mundane service interruptions, the need for disaster recovery resources is sporadic. Since all of the organizations relying on the cloud for backup and recovery are very unlikely to need the infrastructure at the same time, costs can be reduced and the cloud can speed recovery time. 


Cloud-based business resilience managed services are designed to provide a balance of economical shared physical recovery with the speed of dedicated infrastructure. Because the server images and data are continuously replicated, recovery time can be reduced dramatically to less than an hour, and, in many cases, to minutes—or even seconds. However, the costs are more consistent with shared recovery.




Cloud-based business resilience offers several other benefits over traditional disaster recovery models:


Speed to recovery using cloud computing


• More predictable monthly operating expenses can help you avoid the unexpected and hidden costs of do-it-yourself approaches.
• Reduced up-front capital expenditure requirements, because the disaster recovery infrastructure exists in
the cloud.
• Cloud-based business resilience managed services can more easily scale up based on changing conditions.
• Portal access reduces the need to travel to the recovery site which can help save time and money.

A cloud-based approach to business resilience. Virtualizing disaster recovery using cloud computing

While the cloud offers multiple benefits as a disaster recovery platform, there are several key considerations when planning for the transition to cloud-based business resilience and in selecting your cloud partner. These include:

●● Portal access with failover and failback capability
●● Support for disaster recovery testing
●● Tiered service levels
●● Support for mixed and virtualized server environments
●● Global reach and local presence
●● Migration from and coexistence with traditional disaster recovery

The next few sections describe these considerations in greater detail. Facilitating improved control with portal access.  Disaster recovery has traditionally been an insurance policy that organizations hope not to use. In contrast, cloud-based business resilience can actually increase IT’s ability to provide service continuity for key business applications. Since the cloud-based business resilience service can be accessed through a web portal, IT management and administrators gain a dashboard view to their organization’s infrastructure.

               Without the need for a formal declaration and the ability to fail over from the portal, IT can be much more responsive to the more mundane outages and interruptions. Building confidence and refining disaster recovery plans with more frequent testing. One traditional challenge of disaster recovery is the lack of certainty that the planned solution will work when the time comes. Typically, organizations only test their failover and recovery on average once or twice per year, which is hardly sufficient, given the pace of change experienced by most IT departments. This lost sense of control has caused some organizations to bring
disaster recovery “in house,” diverting critical IT focus for mainline application development. Cloud-based business resilience provides the opportunity for more control and more frequent and granular testing of disaster recovery plans, even at the server or application level.

              Supporting optimized application recovery times with tiered service levels Cloud-based business resilience offers the opportunity for tiered service levels that enable you to differentiate applications based on their importance to the organization and the associated tolerance for downtime. The notion of a “server image” is an important part of traditional disaster recovery. As the complexity of IT departments has increased, including multiple server farms with possibly different operating systems and operating system (OS) levels, the ability to respond to a disaster or outage becomes more complex. Organizations are often forced to recover on different hardware, which can take longer and increase the possibility for errors and data loss. Organizations are implementing virtualization technologies in their data centers to help remove some of the underlying complexity and optimize infrastructure utilization. The number of virtual machines installed has been growing exponentially over the past several years.

               According to a recent survey of Chief Information Officers, 98 percent of respondents either had already implemented virtualization or had plans to implement it within the next 12 months. Cloud-based business resilience solutions must offer both physical-to-virtual (P2V) and virtual-to-virtual (V2V) recovery in order to support these types of environments. Cloud-based business resilience requires ongoing server replication, making network bandwidth an important consideration when adopting this approach. A global provider should offer the opportunity for a local presence, thereby reducing the distance that data must travel across the network.

              While cloud-based business resilience offers many advantages for mission-critical and customer-facing applications, an efficient enterprise-wide disaster recovery plan will likely include a blend of traditional and cloud-based approaches. In a recent study, respondents indicated that minimizing data loss was the most important objective of a successful disaster recovery solution. With coordinated disaster recovery and data
back-up, data loss can be reduced and reliability of data integrity improved.


Cloud computing offers a compelling opportunity to realize the recovery time of dedicated disaster recovery with the cost structure of shared disaster recovery. However, disaster recovery planning is not something that is taken lightly; security and resiliency of the cloud are critical considerations.



Posted by Jai Krishna Ponnappan

1 comment:

  1. Well written post. I appreciate your guidance for sharing about disaster cloud recovery. I really need to know about it. Great work!

    ReplyDelete