SlideShare a Scribd company logo
Storing VMs with Cinder and
         Ceph RBD
Growing With Hardware Appliances

C   D   First PB              C   D   Second PB
C   D   •  Proprietary        C   D   •  Proprietary storage
C   D      storage hardware   C   D      hardware
C   D   •  Well-known         C   D   •  Same storage
C   D      storage vendor     C   D      vendor
C   D                         C   D

C   D                         C   D
        $14 b’zillion                 Another
C   D                         C   D

C   D                         C   D
                                      $14 b’zillion
C   D                         C   D

C   D                         C   D

C   D                         C   D




                                                               47
C   D

      C   D

 C    C   D

      C   D

          D

      C   D

      C   D
C++   C   D

      C   D

      C   D

      C   D

      C   D




              52
X
      C   D

      C   D

 C    C   D

      C   D

          D

      C   D

      C   D
C++   C   D

      C   D

      C   D

      C   D

      C   D




              53
C   D

                   C   D

                   C   D

                   C   D

                   C   D

HUMAN         !!   C   D

[DEVELOPER]        C   D

                   C   D

                   C   D

                   C   D

                   C   D

                   C   D




                           54
Hard Drives Are Tiny Record Players and They Fail Often
jon_a_ross, Flickr / CC BY 2.0                            71
D    D

  D    D


  D    D      =
  D    D


x 1 MILLION
                  55 times / day




                                   72
73
philosophy   design


      OPEN SOURCE    SCALABLE

COMMUNITY-FOCUSED    NO SINGLE POINT OF FAILURE

                     SOFTWARE BASED

                     SELF-MANAGING
APP                    APP                  HOST/VM                   CLIENT



                       RADOSGW                 RBD                      CEPH FS
  LIBRADOS
                       A bucket-based REST     A reliable and fully-    A POSIX-compliant
  A library allowing   gateway, compatible     distributed block        distributed file
  apps to directly     with S3 and Swift       device, with a Linux     system, with a Linux
  access RADOS,                                kernel client and a      kernel client and
  with support for                             QEMU/KVM driver          support for FUSE
  C, C++, Java,
  Python, Ruby,
  and PHP




RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes




                                                                                               79
OSD    OSD    OSD    OSD    OSD




                                   btrfs
FS      FS    FS     FS     FS
                                   xfs
                                   ext4
DISK   DISK   DISK   DISK   DISK




  M            M            M



                                           81
HUMAN




        M




M           M




                82
Monitors:



M
    •  Maintain cluster map
    •  Provide consensus for
       distributed decision-making
    •  Must have an odd number
    •  These do not serve stored
       objects to clients


    OSDs:
    •  One per disk (recommended)
    •  At least three in a cluster
    •  Serve stored objects to
       clients
    •  Intelligently peer to perform
       replication tasks
    •  Supports object classes
                                       83
C D
           C D
           C D
           C D
           C D
      ??
APP        C D
           C D
           C D
           C D
           C D
           C D
           C D
C D
      C D
      C D
      C D
      C D
APP   C D
      C D
      C D
      C D
      C D
      C D
      C D
C D
          C D   A-G
          C D
          C D
          C D   H-N
      F
APP   *   C D
          C D
          C D   O-T
          C D
          C D
          C D   U-Z
          C D
10 10 01 01 10 10 01 11 01 10

                               hash(object name) % num pg

10   10    01   01   10   10    01   11   01   10




                               CRUSH(pg, cluster state, rule set)




                                                                    107
10 10 01 01 10 10 01 11 01 10




10   10    01   01   10   10   01   11    01   10




                                                    108
CRUSH
•  Pseudo-random placement
   algorithm
•  Ensures even distribution
•  Repeatable, deterministic
•  Rule-based configuration
 •  Replica count
 •  Infrastructure topology
 •  Weighting




                               109
CLIENT

         ??




              110
112
CLIENT

         ??




              113
111
APP                    APP                  HOST/VM                   CLIENT



                       RADOSGW                 RBD                      CEPH FS
  LIBRADOS
                       A bucket-based REST     A reliable and fully-    A POSIX-compliant
  A library allowing   gateway, compatible     distributed block        distributed file
  apps to directly     with S3 and Swift       device, with a Linux     system, with a Linux
  access RADOS,                                kernel client and a      kernel client and
  with support for                             QEMU/KVM driver          support for FUSE
  C, C++, Java,
  Python, Ruby,
  and PHP




RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes




                                                                                               84
APP
    LIBRADOS

               native




    M
M               M




                        85
LIBRADOS



L
    •  Provides direct access to
       RADOS for applications
    •  C, C++, Python, PHP, Java
    •  No HTTP overhead
APP                    APP                  HOST/VM                   CLIENT



                       RADOSGW                 RBD                      CEPH FS
  LIBRADOS
                       A bucket-based REST     A reliable and fully-    A POSIX-compliant
  A library allowing   gateway, compatible     distributed block        distributed file
  apps to directly     with S3 and Swift       device, with a Linux     system, with a Linux
  access RADOS,                                kernel client and a      kernel client and
  with support for                             QEMU/KVM driver          support for FUSE
  C, C++, Java,
  Python, Ruby,
  and PHP




RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes




                                                                                               87
APP                APP
                                REST




RADOSGW          RADOSGW
  LIBRADOS           LIBRADOS


                                       native




             M
       M         M




                                                88
RADOS Gateway:
•  REST-based interface to
   RADOS
•  Supports buckets,
   accounting
•  Compatible with S3 and
   Swift applications




                             89
APP                    APP                  HOST/VM                   CLIENT



                       RADOSGW                 RBD                      CEPH FS
  LIBRADOS
                       A bucket-based REST     A reliable and fully-    A POSIX-compliant
  A library allowing   gateway, compatible     distributed block        distributed file
  apps to directly     with S3 and Swift       device, with a Linux     system, with a Linux
  access RADOS,                                kernel client and a      kernel client and
  with support for                             QEMU/KVM driver          support for FUSE
  C, C++, Java,
  Python, Ruby,
  and PHP




RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes




                                                                                               90
VM




VIRTUALIZATION CONTAINER
             LIBRBD
            LIBRADOS




        M
   M                   M




                           91
CONTAINER            VM       CONTAINER
   LIBRBD                        LIBRBD
  LIBRADOS                      LIBRADOS




                 M
             M            M




                                           92
HOST
    KRBD (KERNEL MODULE)
           LIBRADOS




       M
M                          M




                               93
RADOS Block Device:
• Storage of virtual disks in RADOS
• Allows decoupling of VMs and
  containers
• Live migration!
• Images are striped across the
  cluster
• Thin-provisioning
• Snapshots and cloning
VM




VIRTUALIZATION CONTAINER
             LIBRBD
            LIBRADOS




        M
   M                   M




                           115
HOW DO YOU
      SPIN UP
THOUSANDS OF VMs
    INSTANTLY
       AND
  EFFICIENTLY?




                   116
instant copy




144   0       0      0   0   = 144
                                     117
write
                          CLIENT
                  write


                  write


                  write




144   4   = 148
                                   118
read


                  read
                         CLIENT
                  read




144   4   = 148
                                  119
old-style VM image creation

local disk                Nova               Glance
(VM images)               compute            (templates)

                                    read X
●   ephemeral

●   expensive to create

                                                  X




        X'




                                                           29
Why use block storage?
• Persistent
  •
    More familiar to users
• Not tied to a single host

  •
    Decouples compute and storage
  •
    Enables Live migration
•
  Extra capabilities of storage system
  •
    Efficient snapshots
  •
    Different types of storage available
  • Cloning for fast restore or scaling
Cinder volume creation

Cinder                   Cinder         volume              Glance
 API                     volume          driver            (templates)

   create image from X
                             locate X
                                                  location of X

                             read X


                                           X
                                                  flexibility in where VM
                                                  images are stored


                                           X'
               reference to X'



                                                                            31
Efficient volume creation

Cinder                   Cinder                   volume              Glance
 API                     volume                    driver            (templates)

   create image from X
                             locate X
                                                            location of X

                             clone X to X'


                                                  X
                                                             fast CoW clone

                                                  X'
                                    X' complete
           reference to X'



                                                                                   32
Questions?

Josh Durgin
josh.durgin@inktank.com
jdurgin on freenode

inktank.com | ceph.com

More Related Content

PDF
Ceph and Mirantis OpenStack
ODP
Block Storage For VMs With Ceph
PDF
Storage tiering and erasure coding in Ceph (SCaLE13x)
PDF
Openstack with ceph
PPTX
What you need to know about ceph
PDF
Ceph - A distributed storage system
PPTX
New Ceph capabilities and Reference Architectures
PDF
HKG15-401: Ceph and Software Defined Storage on ARM servers
Ceph and Mirantis OpenStack
Block Storage For VMs With Ceph
Storage tiering and erasure coding in Ceph (SCaLE13x)
Openstack with ceph
What you need to know about ceph
Ceph - A distributed storage system
New Ceph capabilities and Reference Architectures
HKG15-401: Ceph and Software Defined Storage on ARM servers

What's hot (20)

PPTX
Designing for High Performance Ceph at Scale
PDF
Ceph Performance: Projects Leading up to Jewel
ODP
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLES
PDF
Red Hat Ceph Storage Roadmap: January 2016
PDF
Keeping OpenStack storage trendy with Ceph and containers
PPTX
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
PDF
TUT18972: Unleash the power of Ceph across the Data Center
PDF
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
PDF
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
PDF
CephFS update February 2016
PPTX
Ceph Intro and Architectural Overview by Ross Turk
PDF
Ceph data services in a multi- and hybrid cloud world
PDF
librados
PPTX
Hadoop over rgw
PPTX
QCT Ceph Solution - Design Consideration and Reference Architecture
PPTX
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
PDF
The container revolution, and what it means to operators.pptx
PDF
Accelerating Ceph with RDMA and NVMe-oF
PDF
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
PPTX
Your 1st Ceph cluster
Designing for High Performance Ceph at Scale
Ceph Performance: Projects Leading up to Jewel
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLES
Red Hat Ceph Storage Roadmap: January 2016
Keeping OpenStack storage trendy with Ceph and containers
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
TUT18972: Unleash the power of Ceph across the Data Center
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
CephFS update February 2016
Ceph Intro and Architectural Overview by Ross Turk
Ceph data services in a multi- and hybrid cloud world
librados
Hadoop over rgw
QCT Ceph Solution - Design Consideration and Reference Architecture
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
The container revolution, and what it means to operators.pptx
Accelerating Ceph with RDMA and NVMe-oF
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Your 1st Ceph cluster
Ad

Viewers also liked (7)

PPTX
Ceilo componentization diagrams
PPTX
Openstackサテライトプロジェクトまとめ
PDF
OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telem...
PDF
Webinar Monitoring in era of cloud computing
PPTX
Enforcing Application SLA with Congress and Monasca
PPTX
openstackの仮想マシンHA機能の現状と今後の方向性
PPTX
Ceph and OpenStack - Feb 2014
Ceilo componentization diagrams
Openstackサテライトプロジェクトまとめ
OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telem...
Webinar Monitoring in era of cloud computing
Enforcing Application SLA with Congress and Monasca
openstackの仮想マシンHA機能の現状と今後の方向性
Ceph and OpenStack - Feb 2014
Ad

Similar to Storing VMs with Cinder and Ceph RBD.pdf (20)

PDF
New Features for Ceph with Cinder and Beyond
PDF
New features for Ceph with Cinder and Beyond
PDF
XenSummit - 08/28/2012
PDF
Ceph LISA'12 Presentation
PPTX
Tendências e Evoluções em Armazemamento de Dados
PDF
Storage Developer Conference - 09/19/2012
PDF
Cache Tiering and Erasure Coding
PDF
Ceph - Desmistificando Software-Define Storage
PDF
Cache Tiering and Erasure Coding
PDF
Webinar - Getting Started With Ceph
ODP
London Ceph Day: The Future of CephFS
PPTX
Ceph Day Santa Clara: Ceph Fundamentals
PDF
CloudOpen - 08/29/2012
ODP
Ceph Day NYC: The Future of CephFS
PDF
Ceph Day London 2014 - Ceph Ecosystem Overview
PPTX
Ceph Day NYC: Ceph Fundamentals
PDF
Drbd9 and drbdmanage_june_2016
PDF
Ceph Day LA - RBD: A deep dive
PDF
DEVIEW 2013
PDF
Ceph Day Nov 2012 - Sage Weil
New Features for Ceph with Cinder and Beyond
New features for Ceph with Cinder and Beyond
XenSummit - 08/28/2012
Ceph LISA'12 Presentation
Tendências e Evoluções em Armazemamento de Dados
Storage Developer Conference - 09/19/2012
Cache Tiering and Erasure Coding
Ceph - Desmistificando Software-Define Storage
Cache Tiering and Erasure Coding
Webinar - Getting Started With Ceph
London Ceph Day: The Future of CephFS
Ceph Day Santa Clara: Ceph Fundamentals
CloudOpen - 08/29/2012
Ceph Day NYC: The Future of CephFS
Ceph Day London 2014 - Ceph Ecosystem Overview
Ceph Day NYC: Ceph Fundamentals
Drbd9 and drbdmanage_june_2016
Ceph Day LA - RBD: A deep dive
DEVIEW 2013
Ceph Day Nov 2012 - Sage Weil

More from OpenStack Foundation (20)

PDF
Sponsor Webinar - OpenStack Summit Vancouver 2018
PDF
OpenStack Summits 101: A Guide For Attendees
PPT
OpenStack Marketing Plan - Community Presentation
PPTX
OpenStack 5th Birthday - User Group Parties
PPTX
Liberty release: Preliminary marketing materials & messages
PPTX
OpenStack Foundation 2H 2015 Marketing Plan
PPTX
OpenStack Summit Tokyo Sponsor Webinar
PPTX
Cinder Updates - Liberty Edition
PPTX
Glance Updates - Liberty Edition
PPTX
Heat Updates - Liberty Edition
PPTX
Neutron Updates - Liberty Edition
PPTX
Nova Updates - Liberty Edition
PPTX
Sahara Updates - Liberty Edition
PDF
Searchlight Updates - Liberty Edition
PPTX
Trove Updates - Liberty Edition
PPTX
OpenStack: five years in
PDF
Swift Updates - Liberty Edition
PPTX
Congress Updates - Liberty Edition
PDF
Release Cycle Management Updates - Liberty Edition
PPT
OpenStack Day CEE 2015: Real-World Use Cases
Sponsor Webinar - OpenStack Summit Vancouver 2018
OpenStack Summits 101: A Guide For Attendees
OpenStack Marketing Plan - Community Presentation
OpenStack 5th Birthday - User Group Parties
Liberty release: Preliminary marketing materials & messages
OpenStack Foundation 2H 2015 Marketing Plan
OpenStack Summit Tokyo Sponsor Webinar
Cinder Updates - Liberty Edition
Glance Updates - Liberty Edition
Heat Updates - Liberty Edition
Neutron Updates - Liberty Edition
Nova Updates - Liberty Edition
Sahara Updates - Liberty Edition
Searchlight Updates - Liberty Edition
Trove Updates - Liberty Edition
OpenStack: five years in
Swift Updates - Liberty Edition
Congress Updates - Liberty Edition
Release Cycle Management Updates - Liberty Edition
OpenStack Day CEE 2015: Real-World Use Cases

Storing VMs with Cinder and Ceph RBD.pdf

  • 1. Storing VMs with Cinder and Ceph RBD
  • 2. Growing With Hardware Appliances C D First PB C D Second PB C D •  Proprietary C D •  Proprietary storage C D storage hardware C D hardware C D •  Well-known C D •  Same storage C D storage vendor C D vendor C D C D C D C D $14 b’zillion Another C D C D C D C D $14 b’zillion C D C D C D C D C D C D 47
  • 3. C D C D C C D C D D C D C D C++ C D C D C D C D C D 52
  • 4. X C D C D C C D C D D C D C D C++ C D C D C D C D C D 53
  • 5. C D C D C D C D C D HUMAN !! C D [DEVELOPER] C D C D C D C D C D C D 54
  • 6. Hard Drives Are Tiny Record Players and They Fail Often jon_a_ross, Flickr / CC BY 2.0 71
  • 7. D D D D D D = D D x 1 MILLION 55 times / day 72
  • 8. 73
  • 9. philosophy design OPEN SOURCE SCALABLE COMMUNITY-FOCUSED NO SINGLE POINT OF FAILURE SOFTWARE BASED SELF-MANAGING
  • 10. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHP RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes 79
  • 11. OSD OSD OSD OSD OSD btrfs FS FS FS FS FS xfs ext4 DISK DISK DISK DISK DISK M M M 81
  • 12. HUMAN M M M 82
  • 13. Monitors: M •  Maintain cluster map •  Provide consensus for distributed decision-making •  Must have an odd number •  These do not serve stored objects to clients OSDs: •  One per disk (recommended) •  At least three in a cluster •  Serve stored objects to clients •  Intelligently peer to perform replication tasks •  Supports object classes 83
  • 14. C D C D C D C D C D ?? APP C D C D C D C D C D C D C D
  • 15. C D C D C D C D C D APP C D C D C D C D C D C D C D
  • 16. C D C D A-G C D C D C D H-N F APP * C D C D C D O-T C D C D C D U-Z C D
  • 17. 10 10 01 01 10 10 01 11 01 10 hash(object name) % num pg 10 10 01 01 10 10 01 11 01 10 CRUSH(pg, cluster state, rule set) 107
  • 18. 10 10 01 01 10 10 01 11 01 10 10 10 01 01 10 10 01 11 01 10 108
  • 19. CRUSH •  Pseudo-random placement algorithm •  Ensures even distribution •  Repeatable, deterministic •  Rule-based configuration •  Replica count •  Infrastructure topology •  Weighting 109
  • 20. CLIENT ?? 110
  • 21. 112
  • 22. CLIENT ?? 113
  • 23. 111
  • 24. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHP RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes 84
  • 25. APP LIBRADOS native M M M 85
  • 26. LIBRADOS L •  Provides direct access to RADOS for applications •  C, C++, Python, PHP, Java •  No HTTP overhead
  • 27. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHP RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes 87
  • 28. APP APP REST RADOSGW RADOSGW LIBRADOS LIBRADOS native M M M 88
  • 29. RADOS Gateway: •  REST-based interface to RADOS •  Supports buckets, accounting •  Compatible with S3 and Swift applications 89
  • 30. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHP RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes 90
  • 31. VM VIRTUALIZATION CONTAINER LIBRBD LIBRADOS M M M 91
  • 32. CONTAINER VM CONTAINER LIBRBD LIBRBD LIBRADOS LIBRADOS M M M 92
  • 33. HOST KRBD (KERNEL MODULE) LIBRADOS M M M 93
  • 34. RADOS Block Device: • Storage of virtual disks in RADOS • Allows decoupling of VMs and containers • Live migration! • Images are striped across the cluster • Thin-provisioning • Snapshots and cloning
  • 35. VM VIRTUALIZATION CONTAINER LIBRBD LIBRADOS M M M 115
  • 36. HOW DO YOU SPIN UP THOUSANDS OF VMs INSTANTLY AND EFFICIENTLY? 116
  • 37. instant copy 144 0 0 0 0 = 144 117
  • 38. write CLIENT write write write 144 4 = 148 118
  • 39. read read CLIENT read 144 4 = 148 119
  • 40. old-style VM image creation local disk Nova Glance (VM images) compute (templates) read X ● ephemeral ● expensive to create X X' 29
  • 41. Why use block storage? • Persistent • More familiar to users • Not tied to a single host • Decouples compute and storage • Enables Live migration • Extra capabilities of storage system • Efficient snapshots • Different types of storage available • Cloning for fast restore or scaling
  • 42. Cinder volume creation Cinder Cinder volume Glance API volume driver (templates) create image from X locate X location of X read X X flexibility in where VM images are stored X' reference to X' 31
  • 43. Efficient volume creation Cinder Cinder volume Glance API volume driver (templates) create image from X locate X location of X clone X to X' X fast CoW clone X' X' complete reference to X' 32