SlideShare a Scribd company logo
Accelerating Enterprise OpenStack
When Disaster Strikes the Cloud
Michael Factor
IBM Research - Haifa
factor@il.ibm.com
Who, What, When, Where and How to Recover
Ronen Kat
IBM Research - Haifa
ronenkat@il.ibm.com
Sean Cohen
RedHat
scohen@redhat.com
2
Talk Outline
q What is disaster recovery?
q Concepts and basics
q Protecting data and applications from disasters
q OpenStack Cinder toolbox for disaster recovery
q Applications are more than just data
q The road ahead: Kilo and beyond
3
What is Disaster Recovery?
According to Wikipedia, Disaster Recovery (DR) is "the process, policies and
procedures . . . for recovery . . . of technology infrastructure . . . after a natural or
human-induced disaster.”
Servers Storage Network Software Configuration
Surviving a disaster requires geographic dispersion
4
Recovery Point Objective and Recovery Time Objective
How far back in time a
disaster takes one
How long until operational
after a disaster
Seconds 0
RECOVERY POINT OBJECTIVE
(RPO)
MinutesHoursDaysWeeks Weeks
RECOVERY POINT TIME
(RTO)
DaysHoursMinutesSeconds
Replication
Backup
restore Active site Hot site
5
Data and Metadata Consistency
Data consistency
q If a modified datum is available,
all data it depends upon is also
available
Metadata consistency
q Configuration updates are seen
in the same order relative to one
another and to data updates
Application VM
DB LOG
DB LOG
Remote Site
6
OpenStack Cloud Metadata
Virtual networks between the cloud VM
External network access
Attached volumes
Volume types
Virtual machines flavors
SSH keys for VM access
Virtual machines images
Identities of users
Accelerating Enterprise OpenStack
Protecting Data and Applications
from Disasters
8
Data Protection: Cinder Backup and Restore
q Cinder backup
q Backup a volume to backup storage
Swift
backup-create
Primary Cloud
9
Data Protection: Cinder Backup and Restore
q Can Cinder restore on secondary
cloud?
q Problem: Cinder on secondary
cloud is not aware of the backup
Swift
backup-restore
Primary Cloud
Secondary Cloud
10
Data Protection: Cinder Backup and Restore
q Solution: “electronic tape shipping”
q backup-export
q backup-import
q Cinder supports since Icehouse
Swift
backup-export
Primary Cloud
Secondary Cloud
Backup reference
backup-import
11
Data Protection: Cinder Backup and Restore
q After backup-import Cinder can
restore on secondary cloud
q backup-restore
Swift
backup-restore
Primary Cloud
Secondary Cloud
12
Data Protection: Cinder Volume replication
q Cinder has initial support for
volume replication in Juno release
q Cinder back-ends can “advertise”
support for replication
q Volume created with replication
extra-spec will be allocated on
back-end supporting replication and
will be replicated
q Supporting back ends:
q IBM Storwize, more expected in Kilo
Cinder back-end
Cinder back-end
Volume-type extra specs:
“capabilities:replication
<is> True”
13
Data Protection: Cinder Volume replication
q Secondary volume can become
primary when promoted
q replication-promote
q Replication can be reversed
following a replication-promote
q replication-reenable
Cinder back-end
Cinder back-end
14
Consistency Groups
q New in Juno
q Support for volume grouping for consistency
q Grouping of volumes is based on the volume-type
q Supporting
q Consistency group snapshots
q Needs to be extended to support
q Cinder backup
q Cinder volume replication
DB LOG
15
Protecting Applications from Disasters
Servers Storage Network Software Configuration
Disaster Recovery Orchestration
16
OpenStack Tools
q Applications are defined in OpenStack by
q Heat Orchestration Templates
q However
q Not all applications are template based
q Deployments (including configuration) change over time
q Some definitions are cloud specific, e.g., networks, types
q Heat templates and Stacks don’t stay consistent
q Tools that can create a template from deployment, e.g., Flame, ReHeat
q But, template will only fit the current cloud
17
OpenStack Tools and Beyond
q Demo:
A technology preview for disaster recovery with IBM Cloud Manager
18
THE ROAD AHEAD
19
Ceph Multi-Site & Disaster Recovery (Block) example
q Export snapshots to geographically dispersed data centers
q Provides disaster recovery
q Export incremental snapshots
q Minimize network bandwidth by only sending changes
q Kilo cycle focus to extends the multi-site and disaster recovery options
q  RBD Mirroring
q  Cinder Volume Replication
20
Ceph Multi-Site & Disaster Recovery (Object) example
q Zones and region support
q  Deploy topologies similar to S3
and others with a global
namespace
q Data center synchronization
q  Back-up full or partial sets of data
between regions
q Read affinity
q  Serve local copies of data to local
users
21
Disaster Recovery as a Service Catalog
q Pluggable Disaster Recovery policies
q Replication targets can specify different RPO/RTO levels that can be
offered based on the supported backend capabilities
q Disaster Recovery Policies
q  Active - Cold standby
q  Active - Hot standby
q  Active - Active (requires application awareness and transaction integrity)
q  Backup to Cloud / From the Cloud
22
Extending Heat Orchestration for Disaster Recovery
q Heat can be used to automate
q Add support for Cinder replication
q Need to make Consistency group across OpenStack projects
q Nova Cinder, Trove….
q Stack Snapshot Backup / Rollback
q Enable customization of workload components at recovery site.
q Networks, VM configurations changes, guest agent etc.
23
The Road Toward Application Consistency
First phase: File system consistency
q Integrate into OpenStack to allow consistent snapshots and
backups
q Nova needs to request QEMU Guest Agent to freeze the file systems
(and applications if fsfreeze-hook is installed) during the snapshot
q Patches has proposed for
Nova and Cinder, targeting
the Kilo release
Source: Hitachi
24
The Road Toward Application Consistency
Next phase: Consistency at the application level
q Application-Aware on Windows with VSS Support on qemu-ga
q Application notification via Microsoft Volume Shadow Copy Service (VSS)
q Application-Aware on Linux Using qemu-ga Hooks
q Application-consistent snapshots can be created with scripts interacting with the
QEMU guest agent
q The scripts can notify applications to flush their data
25
Disaster Recovery at Scale
q  Site evacuation holy grail is an automatic planned migration of the
workloads and data from one cloud-scale datacenter to another.
q  New OpenStack HA approaches to help Recovery from infrastructure
failures:
q  Leveraging Pacemaker to provide automated detection of a failed hypervisor
and the recovery of the VMs that were running there.
q  Evacuate instance to a scheduled host was added in Juno
q  Simple tagging API for instances in Nova was accepted for Kilo release
q  Can support automatic-recovery new tag
Suggest removing – no time
26
OpenStack Documentation needs to catch up…
q Join the OpenStack Disaster Recovery Guide
q We have a basic OpenStack High Availability Guide
q http://docs.openstack.org/high-availability-guide/content/
q A very outdated “Recover cloud after disaster” section in the Admin guide
http://docs.openstack.org/admin-guide-cloud/content/section_nova-disaster-
recovery-process.html
Accelerating Enterprise OpenStack
Q&A
Michael Factor
IBM Research - Haifa
factor@il.ibm.com
THANK YOU
Ronen Kat
IBM Research - Haifa
ronenkat@il.ibm.com
Sean Cohen
RedHat
scohen@redhat.com

More Related Content

PDF
Peanut Butter and jelly: Mapping the deep Integration between Ceph and OpenStack
PDF
Protecting the Galaxy - Multi-Region Disaster Recovery with OpenStack and Ceph
PDF
Enabling Disaster Recovery as Service (DRaaS) on OpenStack
PDF
Disaster Recovery and Ceph Block Storage: Introducing Multi-Site Mirroring
PDF
Multiple Sites and Disaster Recovery with Ceph: Andrew Hatfield, Red Hat
PDF
How to Survive an OpenStack Cloud Meltdown with Ceph
PPT
Ceph Performance and Optimization - Ceph Day Frankfurt
PPT
Openstack Summit HK - Ceph defacto - eNovance
Peanut Butter and jelly: Mapping the deep Integration between Ceph and OpenStack
Protecting the Galaxy - Multi-Region Disaster Recovery with OpenStack and Ceph
Enabling Disaster Recovery as Service (DRaaS) on OpenStack
Disaster Recovery and Ceph Block Storage: Introducing Multi-Site Mirroring
Multiple Sites and Disaster Recovery with Ceph: Andrew Hatfield, Red Hat
How to Survive an OpenStack Cloud Meltdown with Ceph
Ceph Performance and Optimization - Ceph Day Frankfurt
Openstack Summit HK - Ceph defacto - eNovance

What's hot (20)

PDF
Ceph & OpenStack talk given @ OpenStack Meetup @ Bangalore, June 2015
PDF
OpenStack Kolla project update rocky release
PDF
OpenStack in Action 4! Sebastien Han - Ceph: de facto storage backend for Ope...
PDF
Deploying openstack using ansible
PDF
OpenNebula Conf 2014 | Understanding the OpenNebula Model for Cloud Provision...
PDF
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
PPT
Ceph de facto storage backend for OpenStack
PDF
Enhancing Kubernetes with Autoscaling & Hybrid Cloud IaaS
PDF
John Spray - Ceph in Kubernetes
PDF
Ceph Tech Talk: Ceph at DigitalOcean
PDF
Rook cncf-wg-storage
PPTX
Introduction to rook
PDF
High Availability from the DevOps side - OpenStack Summit Portland
PDF
Database experiences designing cassandra schema for keystone
PDF
Antoine Coetsier - billing the cloud
PDF
6 open stack_swift_panoramic_view
PDF
OpenNebula Conf 2014 | OpenNebula as Open Replacement of vCloud by Javier Fontan
PDF
A Container Stack for Openstack - OpenStack Silicon Valley
PDF
KubeWHAT!?
PDF
Using Cinder Block Storage
Ceph & OpenStack talk given @ OpenStack Meetup @ Bangalore, June 2015
OpenStack Kolla project update rocky release
OpenStack in Action 4! Sebastien Han - Ceph: de facto storage backend for Ope...
Deploying openstack using ansible
OpenNebula Conf 2014 | Understanding the OpenNebula Model for Cloud Provision...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
Ceph de facto storage backend for OpenStack
Enhancing Kubernetes with Autoscaling & Hybrid Cloud IaaS
John Spray - Ceph in Kubernetes
Ceph Tech Talk: Ceph at DigitalOcean
Rook cncf-wg-storage
Introduction to rook
High Availability from the DevOps side - OpenStack Summit Portland
Database experiences designing cassandra schema for keystone
Antoine Coetsier - billing the cloud
6 open stack_swift_panoramic_view
OpenNebula Conf 2014 | OpenNebula as Open Replacement of vCloud by Javier Fontan
A Container Stack for Openstack - OpenStack Silicon Valley
KubeWHAT!?
Using Cinder Block Storage
Ad

Similar to When disaster strikes the cloud: Who, what, when, where and how to recover (20)

PDF
#VirtualDesignMaster 3 Challenge 3 - Harshvardhan Gupta
PDF
Dragon and cinder v brownbag
PPTX
Kubernetes Disaster Recovery - Los Angeles K8s meetup Dec 10 2019
PDF
The road to enterprise ready open stack storage as service
PPTX
Webinar: What’s Breaking Your VMware Backups? And How You Can Fix Them Quickly
PPTX
Trilio for Red Hat OpenStack: The Missing Link for Cloud-Native Data Protection
PPTX
ACDKOCHI19 - Journey from a traditional on-prem Datacenter to AWS: Challenges...
PPTX
Private Cloud with Open Stack, Docker
PDF
KubeCon US 2021 - Recap - DCMeetup
PDF
La sécurité avec Kubernetes et les conteneurs Docker (June 19th, 2019)
PPTX
IT Resilience Technical
PDF
CN Asturias - Stateful application for kubernetes
PPTX
OpenEBS Technical Workshop - KubeCon San Diego 2019
PDF
Towards the Cloud: Architecture Patterns and VDI Story
PDF
A hitchhiker‘s guide to the cloud native stack
PDF
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
PPTX
CloudStack vs Openstack
PDF
2016 08-30 Kubernetes talk for Waterloo DevOps
PDF
Autopilot : Securing Cloud Native Storage
PDF
Taufik kurniawan strategy and approach to private cloud infrastructure impl...
#VirtualDesignMaster 3 Challenge 3 - Harshvardhan Gupta
Dragon and cinder v brownbag
Kubernetes Disaster Recovery - Los Angeles K8s meetup Dec 10 2019
The road to enterprise ready open stack storage as service
Webinar: What’s Breaking Your VMware Backups? And How You Can Fix Them Quickly
Trilio for Red Hat OpenStack: The Missing Link for Cloud-Native Data Protection
ACDKOCHI19 - Journey from a traditional on-prem Datacenter to AWS: Challenges...
Private Cloud with Open Stack, Docker
KubeCon US 2021 - Recap - DCMeetup
La sécurité avec Kubernetes et les conteneurs Docker (June 19th, 2019)
IT Resilience Technical
CN Asturias - Stateful application for kubernetes
OpenEBS Technical Workshop - KubeCon San Diego 2019
Towards the Cloud: Architecture Patterns and VDI Story
A hitchhiker‘s guide to the cloud native stack
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
CloudStack vs Openstack
2016 08-30 Kubernetes talk for Waterloo DevOps
Autopilot : Securing Cloud Native Storage
Taufik kurniawan strategy and approach to private cloud infrastructure impl...
Ad

More from Sean Cohen (8)

PDF
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
PDF
3-2-1 Action! Running OpenStack Shared File System Service in Production
PDF
Manila, an update from Liberty, OpenStack Summit - Tokyo
PDF
Dude where's my volume, open stack summit vancouver 2015
PDF
Deterministic capacity planning for OpenStack as elastic cloud infrastructure
PDF
Deep dive into OpenStack storage, Sean Cohen, Red Hat
PDF
Kvm forum 2013 - future integration points for oVirt storage
PDF
Integration of Storage, OpenStack & Virtualization
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
3-2-1 Action! Running OpenStack Shared File System Service in Production
Manila, an update from Liberty, OpenStack Summit - Tokyo
Dude where's my volume, open stack summit vancouver 2015
Deterministic capacity planning for OpenStack as elastic cloud infrastructure
Deep dive into OpenStack storage, Sean Cohen, Red Hat
Kvm forum 2013 - future integration points for oVirt storage
Integration of Storage, OpenStack & Virtualization

Recently uploaded (20)

PPTX
A Presentation on Artificial Intelligence
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Modernizing your data center with Dell and AMD
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Approach and Philosophy of On baking technology
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPT
Teaching material agriculture food technology
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Big Data Technologies - Introduction.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
A Presentation on Artificial Intelligence
Advanced methodologies resolving dimensionality complications for autism neur...
Unlocking AI with Model Context Protocol (MCP)
Digital-Transformation-Roadmap-for-Companies.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Modernizing your data center with Dell and AMD
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Approach and Philosophy of On baking technology
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Spectral efficient network and resource selection model in 5G networks
MYSQL Presentation for SQL database connectivity
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Teaching material agriculture food technology
Encapsulation_ Review paper, used for researhc scholars
Chapter 3 Spatial Domain Image Processing.pdf
Empathic Computing: Creating Shared Understanding
Big Data Technologies - Introduction.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Building Integrated photovoltaic BIPV_UPV.pdf

When disaster strikes the cloud: Who, what, when, where and how to recover

  • 1. Accelerating Enterprise OpenStack When Disaster Strikes the Cloud Michael Factor IBM Research - Haifa factor@il.ibm.com Who, What, When, Where and How to Recover Ronen Kat IBM Research - Haifa ronenkat@il.ibm.com Sean Cohen RedHat scohen@redhat.com
  • 2. 2 Talk Outline q What is disaster recovery? q Concepts and basics q Protecting data and applications from disasters q OpenStack Cinder toolbox for disaster recovery q Applications are more than just data q The road ahead: Kilo and beyond
  • 3. 3 What is Disaster Recovery? According to Wikipedia, Disaster Recovery (DR) is "the process, policies and procedures . . . for recovery . . . of technology infrastructure . . . after a natural or human-induced disaster.” Servers Storage Network Software Configuration Surviving a disaster requires geographic dispersion
  • 4. 4 Recovery Point Objective and Recovery Time Objective How far back in time a disaster takes one How long until operational after a disaster Seconds 0 RECOVERY POINT OBJECTIVE (RPO) MinutesHoursDaysWeeks Weeks RECOVERY POINT TIME (RTO) DaysHoursMinutesSeconds Replication Backup restore Active site Hot site
  • 5. 5 Data and Metadata Consistency Data consistency q If a modified datum is available, all data it depends upon is also available Metadata consistency q Configuration updates are seen in the same order relative to one another and to data updates Application VM DB LOG DB LOG Remote Site
  • 6. 6 OpenStack Cloud Metadata Virtual networks between the cloud VM External network access Attached volumes Volume types Virtual machines flavors SSH keys for VM access Virtual machines images Identities of users
  • 7. Accelerating Enterprise OpenStack Protecting Data and Applications from Disasters
  • 8. 8 Data Protection: Cinder Backup and Restore q Cinder backup q Backup a volume to backup storage Swift backup-create Primary Cloud
  • 9. 9 Data Protection: Cinder Backup and Restore q Can Cinder restore on secondary cloud? q Problem: Cinder on secondary cloud is not aware of the backup Swift backup-restore Primary Cloud Secondary Cloud
  • 10. 10 Data Protection: Cinder Backup and Restore q Solution: “electronic tape shipping” q backup-export q backup-import q Cinder supports since Icehouse Swift backup-export Primary Cloud Secondary Cloud Backup reference backup-import
  • 11. 11 Data Protection: Cinder Backup and Restore q After backup-import Cinder can restore on secondary cloud q backup-restore Swift backup-restore Primary Cloud Secondary Cloud
  • 12. 12 Data Protection: Cinder Volume replication q Cinder has initial support for volume replication in Juno release q Cinder back-ends can “advertise” support for replication q Volume created with replication extra-spec will be allocated on back-end supporting replication and will be replicated q Supporting back ends: q IBM Storwize, more expected in Kilo Cinder back-end Cinder back-end Volume-type extra specs: “capabilities:replication <is> True”
  • 13. 13 Data Protection: Cinder Volume replication q Secondary volume can become primary when promoted q replication-promote q Replication can be reversed following a replication-promote q replication-reenable Cinder back-end Cinder back-end
  • 14. 14 Consistency Groups q New in Juno q Support for volume grouping for consistency q Grouping of volumes is based on the volume-type q Supporting q Consistency group snapshots q Needs to be extended to support q Cinder backup q Cinder volume replication DB LOG
  • 15. 15 Protecting Applications from Disasters Servers Storage Network Software Configuration Disaster Recovery Orchestration
  • 16. 16 OpenStack Tools q Applications are defined in OpenStack by q Heat Orchestration Templates q However q Not all applications are template based q Deployments (including configuration) change over time q Some definitions are cloud specific, e.g., networks, types q Heat templates and Stacks don’t stay consistent q Tools that can create a template from deployment, e.g., Flame, ReHeat q But, template will only fit the current cloud
  • 17. 17 OpenStack Tools and Beyond q Demo: A technology preview for disaster recovery with IBM Cloud Manager
  • 19. 19 Ceph Multi-Site & Disaster Recovery (Block) example q Export snapshots to geographically dispersed data centers q Provides disaster recovery q Export incremental snapshots q Minimize network bandwidth by only sending changes q Kilo cycle focus to extends the multi-site and disaster recovery options q  RBD Mirroring q  Cinder Volume Replication
  • 20. 20 Ceph Multi-Site & Disaster Recovery (Object) example q Zones and region support q  Deploy topologies similar to S3 and others with a global namespace q Data center synchronization q  Back-up full or partial sets of data between regions q Read affinity q  Serve local copies of data to local users
  • 21. 21 Disaster Recovery as a Service Catalog q Pluggable Disaster Recovery policies q Replication targets can specify different RPO/RTO levels that can be offered based on the supported backend capabilities q Disaster Recovery Policies q  Active - Cold standby q  Active - Hot standby q  Active - Active (requires application awareness and transaction integrity) q  Backup to Cloud / From the Cloud
  • 22. 22 Extending Heat Orchestration for Disaster Recovery q Heat can be used to automate q Add support for Cinder replication q Need to make Consistency group across OpenStack projects q Nova Cinder, Trove…. q Stack Snapshot Backup / Rollback q Enable customization of workload components at recovery site. q Networks, VM configurations changes, guest agent etc.
  • 23. 23 The Road Toward Application Consistency First phase: File system consistency q Integrate into OpenStack to allow consistent snapshots and backups q Nova needs to request QEMU Guest Agent to freeze the file systems (and applications if fsfreeze-hook is installed) during the snapshot q Patches has proposed for Nova and Cinder, targeting the Kilo release Source: Hitachi
  • 24. 24 The Road Toward Application Consistency Next phase: Consistency at the application level q Application-Aware on Windows with VSS Support on qemu-ga q Application notification via Microsoft Volume Shadow Copy Service (VSS) q Application-Aware on Linux Using qemu-ga Hooks q Application-consistent snapshots can be created with scripts interacting with the QEMU guest agent q The scripts can notify applications to flush their data
  • 25. 25 Disaster Recovery at Scale q  Site evacuation holy grail is an automatic planned migration of the workloads and data from one cloud-scale datacenter to another. q  New OpenStack HA approaches to help Recovery from infrastructure failures: q  Leveraging Pacemaker to provide automated detection of a failed hypervisor and the recovery of the VMs that were running there. q  Evacuate instance to a scheduled host was added in Juno q  Simple tagging API for instances in Nova was accepted for Kilo release q  Can support automatic-recovery new tag Suggest removing – no time
  • 26. 26 OpenStack Documentation needs to catch up… q Join the OpenStack Disaster Recovery Guide q We have a basic OpenStack High Availability Guide q http://docs.openstack.org/high-availability-guide/content/ q A very outdated “Recover cloud after disaster” section in the Admin guide http://docs.openstack.org/admin-guide-cloud/content/section_nova-disaster- recovery-process.html
  • 27. Accelerating Enterprise OpenStack Q&A Michael Factor IBM Research - Haifa factor@il.ibm.com THANK YOU Ronen Kat IBM Research - Haifa ronenkat@il.ibm.com Sean Cohen RedHat scohen@redhat.com