SlideShare a Scribd company logo
8
Most read
11
Most read
12
Most read
Disaster Recovery
Business & Technology
        Varrow Madness
        March 15, 2012

          Andrew Miller
       Technical Consultant
t: @andriven w:www.thinkmeta.net
One Big Reason to Do This



Expectations for Disaster
       Recovery
                            ≠       IT Capabilities
                                for Disaster Recovery
What is a Disaster?
• Disaster: An event that affects a service or system such
  that significant effort is required to restore the original
  performance level.
               » IT Service Management Forum


                                   But what does that look like IN
                                    OUR ENVIRONMENT?
                                   What disaster and recovery
                                    scenarios should we plan for?
                                   Where do we begin?
                                   How do we do it?
Example of a Disaster
Disaster Recovery vs. Operational Recovery
• Disaster Recovery
   – To cope with & recover from an IT crisis that moves work to an
     alternative system in a non-routine way.
   – A real “disaster” is large in scope and impact
   – DR typically implies failure of the primary data center and recovery to an
     alternate site
• Operational Recovery
   – Addresses more “routine” types of failures (server, network, storage,
     etc.)
   – Events are smaller in scope and impact than a full “disaster”
   – Typically implies recovering to alternate equipment within the primary
     data center
• Business expectations for recovery timeframe is typically
  shorter for “operational recovery” issues than a true “disaster”
• Each should have its own clearly defined objectives
Risks, Threats and Vulnerabilities

Risk is a function of the likelihood of a given threat
acting upon a particular potential vulnerability,
and the resulting impact of that adverse event on
the organization.
Some threats that can cause Disasters…
• Human Error
• Localized IT systems /
  network failure
• Extended power outage
• Telecommunications outage
• Storm / Weather damage
• Earthquake / Volcano
• Fire in the facility
• Facility flooding
• Local evacuation
• Cyber attack
• Sabotage
(Varrow) Disaster Recovery Approach
• Interviews with key personnel to understand Business Process priorities
  and establish Business Impact Analysis (BIA).
• Review existing IT production infrastructure, including applications,
  servers, storage, network, and external connectivity. Identify Risks and
  Gaps.
• Establish Disaster Impact Scenarios and Disaster Recovery strategies to
  meet requirements.
• Recommend Roadmap for establishing recovery capabilities and
  documenting plans.
• Implement required recovery capabilities.
• Develop framework and content for IT DR Plan.
• Develop maintenance and test procedures for IT DR Plan.
• Address Business Continuity requirements and planning as appropriate.
What is the Business Impact Analysis?
• A conversation between IT and key stakeholders to
  understand:
   – What are the most time-critical and information-critical
     business processes?
   – How does the business REALLY rely upon IT Service and
     Application availability?
   – What are the Student, Financial, Regulatory, Reputational,
     and other impacts of IT Service and Application
     unavailability?
   – What availability or recoverability capabilities are justifiable
     based on these requirements, potential impact, and costs?
Disaster Recovery: Key Measures

          Recovery Point Objectives Recovery Time Objectives
                            (RPO) (RTO)




                  5      6      7      8      9     10     11     12      1      2      3      4      5      6      7
                a.m.   a.m.   a.m.   a.m.   a.m.   a.m.   a.m.   a.m.   p.m.   p.m.   p.m.   p.m.   p.m.   p.m.   p.m.


RPO: Amount of data lost from                 DECLARE             RTO: Targeted amount of time
failure, measured as the amount               DISASTER            to restart a business service
                                               10 a.m.
of time from a disaster event                                     after a disaster event
Disaster Recovery: Key Measures
• Recovery Time Objective (RTO)
   Maximum duration of disruption of service
• Recovery Point Objective (RPO)
   Point in time to which application data is recovered / Maximum data loss


         Weeks   Days   Hours   Minutes   Seconds     Seconds    Minutes   Hours   Days   Weeks




            Recovery Point                                      Recovery Time

                                               Real Time


                                            Cost
BIA - Example Priority Tiers
     Priority Tier                                           Description
Priority 1               Services whose unavailability more than a brief period can have a severe impact on
High Availability /      customers or time-critical business operations.
Immediate Recovery
Priority 2               Services whose unavailability significantly impacts customers or business
1-2 day recovery         operations.
Priority 3               Services which can tolerate up to five days of disruption in a disaster.
3-5 day recovery
Priority 4               Services which can tolerate up to ten days of disruption in a disaster.
6-10 day recovery
                         Priority 3 and 4 systems may be restored in less time, depending on the situation.
                         However, higher priority functions will be restored first.
Priority 5               Non-critical services which can tolerate two weeks or more of disruption in a
“Best effort” recovery   disaster. These systems will be restored on a best-effort basis, after other more
                         critical systems have been restored and ongoing operations have resumed.

                         Priority 5 systems may be restored in less time, depending on the situation.
                         However, higher priority functions will be restored first. In some cases, systems
                         deemed to not be required for continued operations may not be restored.
What does it take to RECOVER
                 from an IT Disaster?
•   Data Protection
     – Backups, Replication
•   Recovery Facility
     – Location to rebuild IT infrastructure or provision services
•   Data Recovery & Storage
     – Get Data into a form that is usable
•   Servers / Compute Capacity
     – Sufficient servers or virtual compute capacity to actually run the applications
•   Network, Voice, and Data Communications
     –   Connect servers, storage and workers
     –   Connect the recovery site to work sites
     –   Communicate with customers
     –   Includes network, telecom, demarcation equipment; cabling; telecom provisioning
•   DR Plan
     – Documented and tested procedures for what to do, and how to do it
•   People
Example Disaster Recovery Strategies
    Priority          Disaster Recovery Strategy                     Data Protection Approach

Priority 1       Establish hot site for systems and data in a     Replicate / remote mirror / short
4 hour RTO or        secondary data center at a remote               interval remote disk-to-disk
    less             location that is unlikely to be impacted        backup
                     by a local or regional event.


Priority 2       Maintain sufficient remote physical or virtual   Remote disk-to-disk backup
24-48 hour RTO       infrastructure for restoration. Ensure
                     sufficient space/power in recovery
                     facility.
Priority 3       Ensure ability to quickly acquire                Tape (with sufficient off-site rotation)
72 hour RTO         infrastructure for restoration. Ensure            or remote disk-to-disk backup
                    sufficient space/power in recovery
                    facility.
Priority 4       Ensure ability to quickly acquire                Tape (with sufficient off-site rotation)
1-2 week RTO        infrastructure for restoration. Ensure            or remote disk-to-disk backup
                    sufficient space/power in recovery
                    facility.
Storage Arrays + Replication
               PRODUCTION SITE                                         OPTIONAL DISASTER RECOVERY SITE


Application                        Local      RecoverPoint bi-directional   Remote                   Standby
 servers                           copy          replication/recovery        copy                    servers
                 RecoverPoint                                                         RecoverPoint
                  appliance                                                            appliance
                                             Production and
                                             local journals

                                    Prod             Fibre   Remote
                          SAN       LUN           Channel/WAN journal                 SAN
                                     s

                                   Storage                                  Storage
   Host-based write splitter        arrays                                   arrays
   Fabric-based write splitter
   Symmetrix VMAXe, VNX-, and
   CLARiiON-based write splitter
Site A (Primary)                                            Site B (Recovery)
                      Site                                                          Site
vCenter Server      Recovery                                 vCenter Server       Recovery
                    Manager                                                       Manager




          vSphere                                                      vSphere
                                      vSphere
                                     Replication



                                    Storage-based
                                      replication
   vSphere Replication
   Simple, cost-efficient replication for Tier 2 applications and smaller sites

   Storage-based Replication
   High-performance replication for business-critical applications in larger sites
Discussion / Q&A

More Related Content

PPTX
CLUSTER COMPUTING
PPTX
Trends in distributed systems
PDF
NETWORK PLANNING AND DESIGN,
PPTX
Structure of shared memory space
PDF
Difference between dtd and xsd
PPTX
Virtual memory managment
PPTX
Fast RTPS: Programming with the Default Middleware for Robotics Adopted in ROS2
PPTX
Mobile cloud Computing
CLUSTER COMPUTING
Trends in distributed systems
NETWORK PLANNING AND DESIGN,
Structure of shared memory space
Difference between dtd and xsd
Virtual memory managment
Fast RTPS: Programming with the Default Middleware for Robotics Adopted in ROS2
Mobile cloud Computing

What's hot (20)

PPT
Cluster Computing Seminar.
PPTX
Cluster computing
PPTX
Middleware Technologies ppt
PPTX
network ram parallel computing
PPT
File models and file accessing models
PDF
An Introduction to OMNeT++ 5.4
PPT
Distributed System-Multicast & Indirect communication
PPTX
Distributed web based systems
PDF
Communication Patterns Using Data-Centric Publish/Subscribe
PPTX
Operating Systems – Structuring Methods.pptx
PPTX
Plant Disease Detection Using ML.pptx
PPTX
Scheduling in Cloud Computing
DOCX
Record storage and primary file organization
PPTX
Cluster computing ppt
DOC
Centralized vs distrbution system
PPTX
Cloud computing stack
PDF
Distributed storage system
PPTX
An introduction to Test Driven Development on MapReduce
PDF
Distributed Computing
PPTX
Buses de datos IEEE
Cluster Computing Seminar.
Cluster computing
Middleware Technologies ppt
network ram parallel computing
File models and file accessing models
An Introduction to OMNeT++ 5.4
Distributed System-Multicast & Indirect communication
Distributed web based systems
Communication Patterns Using Data-Centric Publish/Subscribe
Operating Systems – Structuring Methods.pptx
Plant Disease Detection Using ML.pptx
Scheduling in Cloud Computing
Record storage and primary file organization
Cluster computing ppt
Centralized vs distrbution system
Cloud computing stack
Distributed storage system
An introduction to Test Driven Development on MapReduce
Distributed Computing
Buses de datos IEEE
Ad

Similar to Disaster Recovery - Business & Technology (20)

PPTX
AITP July 2012 Presentation - Disaster Recovery - Business + Technology
PPTX
Varrow Madness 2014 DR Presentation
PDF
STN Event 12.8.09 - Chris Vain Powerpoint Presentation
PPTX
CS_10_DR_CFD
PDF
WI_Symposium_Conference_2014
PPT
Plate Spin Disaster Recovery Solution
PPTX
November 2014 Webinar - Disaster Recovery Worthy of a Zombie Apocalypse
PDF
IBM PROTECTIER: FROM BACKUP TO RECOVERY
PDF
Zerto for dr migration to cloud overview
PDF
VMworld 2013: DR to The Cloud with VMware Site Recovery Manager and Rackspace...
PPTX
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017
PDF
Disaster recovery - What, Why, and How
PPT
VMWare Forum Winnipeg - 2012
PPTX
Disaster Recovery & Business Resilience Trends - CloudSmartz | Smarter Transf...
PDF
Backing up your virtual environment best practices
PPTX
Learn the facts about replication in mainframe storage webinar
PDF
Disaster Recovery and Reliability
PPTX
Business Track Session 1: The Power of udp
PDF
The Great Disconnect of Data Protection: Perception, Reality and Best Practices
PPT
Track 2, session 3, business continuity and disaster recovery in the virtuali...
AITP July 2012 Presentation - Disaster Recovery - Business + Technology
Varrow Madness 2014 DR Presentation
STN Event 12.8.09 - Chris Vain Powerpoint Presentation
CS_10_DR_CFD
WI_Symposium_Conference_2014
Plate Spin Disaster Recovery Solution
November 2014 Webinar - Disaster Recovery Worthy of a Zombie Apocalypse
IBM PROTECTIER: FROM BACKUP TO RECOVERY
Zerto for dr migration to cloud overview
VMworld 2013: DR to The Cloud with VMware Site Recovery Manager and Rackspace...
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017
Disaster recovery - What, Why, and How
VMWare Forum Winnipeg - 2012
Disaster Recovery & Business Resilience Trends - CloudSmartz | Smarter Transf...
Backing up your virtual environment best practices
Learn the facts about replication in mainframe storage webinar
Disaster Recovery and Reliability
Business Track Session 1: The Power of udp
The Great Disconnect of Data Protection: Perception, Reality and Best Practices
Track 2, session 3, business continuity and disaster recovery in the virtuali...
Ad

More from Andrew Miller (6)

PPTX
The Golden Hammer
PPTX
Citrix Flexcast + Assessment Approach Lunch & Learn
PPTX
Q2 Sirius Lunch & Learn - vSphere 6 & Windows 2003 EoL
PPTX
Varrow Madness Sneak Peek
PPTX
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...
PPTX
Virtualizing Tier One Applications - Varrow
The Golden Hammer
Citrix Flexcast + Assessment Approach Lunch & Learn
Q2 Sirius Lunch & Learn - vSphere 6 & Windows 2003 EoL
Varrow Madness Sneak Peek
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...
Virtualizing Tier One Applications - Varrow

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Cloud computing and distributed systems.
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
KodekX | Application Modernization Development
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
Big Data Technologies - Introduction.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Per capita expenditure prediction using model stacking based on satellite ima...
Understanding_Digital_Forensics_Presentation.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Encapsulation_ Review paper, used for researhc scholars
NewMind AI Monthly Chronicles - July 2025
Empathic Computing: Creating Shared Understanding
Cloud computing and distributed systems.
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
KodekX | Application Modernization Development
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Big Data Technologies - Introduction.pptx
The AUB Centre for AI in Media Proposal.docx
Network Security Unit 5.pdf for BCA BBA.
Mobile App Security Testing_ A Comprehensive Guide.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Unlocking AI with Model Context Protocol (MCP)

Disaster Recovery - Business & Technology

  • 1. Disaster Recovery Business & Technology Varrow Madness March 15, 2012 Andrew Miller Technical Consultant t: @andriven w:www.thinkmeta.net
  • 2. One Big Reason to Do This Expectations for Disaster Recovery ≠ IT Capabilities for Disaster Recovery
  • 3. What is a Disaster? • Disaster: An event that affects a service or system such that significant effort is required to restore the original performance level. » IT Service Management Forum  But what does that look like IN OUR ENVIRONMENT?  What disaster and recovery scenarios should we plan for?  Where do we begin?  How do we do it?
  • 4. Example of a Disaster
  • 5. Disaster Recovery vs. Operational Recovery • Disaster Recovery – To cope with & recover from an IT crisis that moves work to an alternative system in a non-routine way. – A real “disaster” is large in scope and impact – DR typically implies failure of the primary data center and recovery to an alternate site • Operational Recovery – Addresses more “routine” types of failures (server, network, storage, etc.) – Events are smaller in scope and impact than a full “disaster” – Typically implies recovering to alternate equipment within the primary data center • Business expectations for recovery timeframe is typically shorter for “operational recovery” issues than a true “disaster” • Each should have its own clearly defined objectives
  • 6. Risks, Threats and Vulnerabilities Risk is a function of the likelihood of a given threat acting upon a particular potential vulnerability, and the resulting impact of that adverse event on the organization.
  • 7. Some threats that can cause Disasters… • Human Error • Localized IT systems / network failure • Extended power outage • Telecommunications outage • Storm / Weather damage • Earthquake / Volcano • Fire in the facility • Facility flooding • Local evacuation • Cyber attack • Sabotage
  • 8. (Varrow) Disaster Recovery Approach • Interviews with key personnel to understand Business Process priorities and establish Business Impact Analysis (BIA). • Review existing IT production infrastructure, including applications, servers, storage, network, and external connectivity. Identify Risks and Gaps. • Establish Disaster Impact Scenarios and Disaster Recovery strategies to meet requirements. • Recommend Roadmap for establishing recovery capabilities and documenting plans. • Implement required recovery capabilities. • Develop framework and content for IT DR Plan. • Develop maintenance and test procedures for IT DR Plan. • Address Business Continuity requirements and planning as appropriate.
  • 9. What is the Business Impact Analysis? • A conversation between IT and key stakeholders to understand: – What are the most time-critical and information-critical business processes? – How does the business REALLY rely upon IT Service and Application availability? – What are the Student, Financial, Regulatory, Reputational, and other impacts of IT Service and Application unavailability? – What availability or recoverability capabilities are justifiable based on these requirements, potential impact, and costs?
  • 10. Disaster Recovery: Key Measures Recovery Point Objectives Recovery Time Objectives (RPO) (RTO) 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 a.m. a.m. a.m. a.m. a.m. a.m. a.m. a.m. p.m. p.m. p.m. p.m. p.m. p.m. p.m. RPO: Amount of data lost from DECLARE RTO: Targeted amount of time failure, measured as the amount DISASTER to restart a business service 10 a.m. of time from a disaster event after a disaster event
  • 11. Disaster Recovery: Key Measures • Recovery Time Objective (RTO) Maximum duration of disruption of service • Recovery Point Objective (RPO) Point in time to which application data is recovered / Maximum data loss Weeks Days Hours Minutes Seconds Seconds Minutes Hours Days Weeks Recovery Point Recovery Time Real Time Cost
  • 12. BIA - Example Priority Tiers Priority Tier Description Priority 1 Services whose unavailability more than a brief period can have a severe impact on High Availability / customers or time-critical business operations. Immediate Recovery Priority 2 Services whose unavailability significantly impacts customers or business 1-2 day recovery operations. Priority 3 Services which can tolerate up to five days of disruption in a disaster. 3-5 day recovery Priority 4 Services which can tolerate up to ten days of disruption in a disaster. 6-10 day recovery Priority 3 and 4 systems may be restored in less time, depending on the situation. However, higher priority functions will be restored first. Priority 5 Non-critical services which can tolerate two weeks or more of disruption in a “Best effort” recovery disaster. These systems will be restored on a best-effort basis, after other more critical systems have been restored and ongoing operations have resumed. Priority 5 systems may be restored in less time, depending on the situation. However, higher priority functions will be restored first. In some cases, systems deemed to not be required for continued operations may not be restored.
  • 13. What does it take to RECOVER from an IT Disaster? • Data Protection – Backups, Replication • Recovery Facility – Location to rebuild IT infrastructure or provision services • Data Recovery & Storage – Get Data into a form that is usable • Servers / Compute Capacity – Sufficient servers or virtual compute capacity to actually run the applications • Network, Voice, and Data Communications – Connect servers, storage and workers – Connect the recovery site to work sites – Communicate with customers – Includes network, telecom, demarcation equipment; cabling; telecom provisioning • DR Plan – Documented and tested procedures for what to do, and how to do it • People
  • 14. Example Disaster Recovery Strategies Priority Disaster Recovery Strategy Data Protection Approach Priority 1 Establish hot site for systems and data in a Replicate / remote mirror / short 4 hour RTO or secondary data center at a remote interval remote disk-to-disk less location that is unlikely to be impacted backup by a local or regional event. Priority 2 Maintain sufficient remote physical or virtual Remote disk-to-disk backup 24-48 hour RTO infrastructure for restoration. Ensure sufficient space/power in recovery facility. Priority 3 Ensure ability to quickly acquire Tape (with sufficient off-site rotation) 72 hour RTO infrastructure for restoration. Ensure or remote disk-to-disk backup sufficient space/power in recovery facility. Priority 4 Ensure ability to quickly acquire Tape (with sufficient off-site rotation) 1-2 week RTO infrastructure for restoration. Ensure or remote disk-to-disk backup sufficient space/power in recovery facility.
  • 15. Storage Arrays + Replication PRODUCTION SITE OPTIONAL DISASTER RECOVERY SITE Application Local RecoverPoint bi-directional Remote Standby servers copy replication/recovery copy servers RecoverPoint RecoverPoint appliance appliance Production and local journals Prod Fibre Remote SAN LUN Channel/WAN journal SAN s Storage Storage Host-based write splitter arrays arrays Fabric-based write splitter Symmetrix VMAXe, VNX-, and CLARiiON-based write splitter
  • 16. Site A (Primary) Site B (Recovery) Site Site vCenter Server Recovery vCenter Server Recovery Manager Manager vSphere vSphere vSphere Replication Storage-based replication vSphere Replication Simple, cost-efficient replication for Tier 2 applications and smaller sites Storage-based Replication High-performance replication for business-critical applications in larger sites

Editor's Notes

  • #11: Note to Presenter: View in Slide Show mode for animation. When EMC or its partners talk about remote replication, they usually mean between storage at two locations. The source and target are physically separated to reduce the risks associated with co-location. Remote replicated systems could be across a campus, across a town, or across the globe. Their physical distance and technology selected can affect how quickly you recover from a disruption and how much data is lost.Organizations normally set requirements for how much lost data and how much time to come back online is acceptable. The recovery point objective (RPO) is the amount of data that can be lost, measured in terms of time without being catastrophic to the business. The recovery time objective (RTO) is the amount of time that it takes to recover the data and restart your business services from the recovered data. Remote replication provides much lower RPOs (at or close to zero) and very small RTOs, depending on implementation. The bottom line is that replication is appropriate for all types of data, and the RPO and RTO you target are going to affect your implementation.For multiple RPOs and for remote replication with either zero or low RPO—and near-instant to instant recovery with DVR-like technology, EMC offers the RecoverPoint family.