SlideShare a Scribd company logo
New Features for Ceph with Cinder
          and Beyond
73
60



         Why Ceph?

     •   Low cost

     •   Flexible

     •   Scalable

     •   Open source
APP                    APP                  HOST/VM                   CLIENT



                       RADOSGW                 RBD                      CEPH FS
  LIBRADOS
                       A bucket-based REST     A reliable and fully-    A POSIX-compliant
  A library allowing   gateway, compatible     distributed block        distributed file
  apps to directly     with S3 and Swift       device, with a Linux     system, with a Linux
  access RADOS,                                kernel client and a      kernel client and
  with support for                             QEMU/KVM driver          support for FUSE
  C, C++, Java,
  Python, Ruby,
  and PHP




RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes




                                                                                               79
OSD    OSD    OSD    OSD    OSD




                                   btrfs
FS      FS    FS     FS     FS
                                   xfs
                                   ext4
DISK   DISK   DISK   DISK   DISK




  M            M            M



                                           81
HUMAN




        M




M           M




                82
Monitors:



M
    •  Maintain cluster map
    •  Provide consensus for
       distributed decision-making
    •  Must have an odd number
    •  These do not serve stored
       objects to clients


    OSDs:
    •  One per disk (recommended)
    •  At least three in a cluster
    •  Serve stored objects to
       clients
    •  Intelligently peer to perform
       replication tasks
    •  Supports object classes
                                       83
C D
      C D
      C D
      C D
      C D
APP   C D
      C D
      C D
      C D
      C D
      C D
      C D
C D
      C D
      C D
      C D
      C D
APP   C D
      C D
      C D
      C D
      C D
      C D
      C D
C D
          C D   A-G
          C D
          C D
          C D   H-N
      F
APP   *   C D
          C D
          C D   O-T
          C D
          C D
          C D   U-Z
          C D
10 10 01 01 10 10 01 11 01 10

                               hash(object name) % num pg

10   10    01   01   10   10    01   11   01   10




                               CRUSH(pg, cluster state, rule set)




                                                                    107
10 10 01 01 10 10 01 11 01 10




10   10    01   01   10   10   01   11    01   10




                                                    108
CRUSH
•  Pseudo-random placement
   algorithm
•  Ensures even distribution
•  Repeatable, deterministic
•  Rule-based configuration
 •  Replica count
 •  Infrastructure topology
 •  Weighting




                               109
CLIENT

         ??




              110
112
CLIENT

         ??




              113
111
APP                    APP                  HOST/VM                   CLIENT



                       RADOSGW                 RBD                      CEPH FS
  LIBRADOS
                       A bucket-based REST     A reliable and fully-    A POSIX-compliant
  A library allowing   gateway, compatible     distributed block        distributed file
  apps to directly     with S3 and Swift       device, with a Linux     system, with a Linux
  access RADOS,                                kernel client and a      kernel client and
  with support for                             QEMU/KVM driver          support for FUSE
  C, C++, Java,
  Python, Ruby,
  and PHP




RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes




                                                                                               84
APP
    LIBRADOS

               native




    M
M               M




                        85
LIBRADOS



L
    •  Provides direct access to
       RADOS for applications
    •  C, C++, Python, PHP, Java
    •  No HTTP overhead
APP                    APP                  HOST/VM                   CLIENT



                       RADOSGW                 RBD                      CEPH FS
  LIBRADOS
                       A bucket-based REST     A reliable and fully-    A POSIX-compliant
  A library allowing   gateway, compatible     distributed block        distributed file
  apps to directly     with S3 and Swift       device, with a Linux     system, with a Linux
  access RADOS,                                kernel client and a      kernel client and
  with support for                             QEMU/KVM driver          support for FUSE
  C, C++, Java,
  Python, Ruby,
  and PHP




RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes




                                                                                               87
APP                APP
                                REST




RADOSGW          RADOSGW
  LIBRADOS           LIBRADOS


                                       native




             M
       M         M




                                                88
RADOS Gateway:
•  REST-based interface to
   RADOS
•  Supports buckets,
   accounting
•  Compatible with S3 and
   Swift applications




                             89
APP                    APP                  HOST/VM                   CLIENT



                       RADOSGW                 RBD                      CEPH FS
  LIBRADOS
                       A bucket-based REST     A reliable and fully-    A POSIX-compliant
  A library allowing   gateway, compatible     distributed block        distributed file
  apps to directly     with S3 and Swift       device, with a Linux     system, with a Linux
  access RADOS,                                kernel client and a      kernel client and
  with support for                             QEMU/KVM driver          support for FUSE
  C, C++, Java,
  Python, Ruby,
  and PHP




RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes




                                                                                               90
VM




VIRTUALIZATION CONTAINER
             LIBRBD
            LIBRADOS




        M
   M                   M




                           91
CONTAINER            VM       CONTAINER
   LIBRBD                        LIBRBD
  LIBRADOS                      LIBRADOS




                 M
             M            M




                                           92
HOST
    KRBD (KERNEL MODULE)
           LIBRADOS




       M
M                          M




                               93
RADOS Block Device:
• Storage of virtual disks in RADOS
• Allows decoupling of VMs and
  containers
• Live migration!
• Images are striped across the
  cluster
• Thin-provisioning
• Snapshots and cloning
VM




VIRTUALIZATION CONTAINER
             LIBRBD
            LIBRADOS




        M
   M                   M




                           115
HOW DO YOU
      SPIN UP
THOUSANDS OF VMs
    INSTANTLY
       AND
  EFFICIENTLY?




                   116
instant copy




144   0       0      0   0   = 144
                                     117
write
                          CLIENT
                  write


                  write


                  write




144   4   = 148
                                   118
read


                  read
                         CLIENT
                  read




144   4   = 148
                                  119
old-style VM image creation

local disk                Nova               Glance
(VM images)               compute            (templates)

                                    read X
●   ephemeral

●   expensive to create

                                                  X




        X'




                                                           29
Why use block storage?
• Persistent
  •
    More familiar to users
•
  Not tied to a single host
  •
    Decouples compute and storage
  •
    Enables Live migration
• Extra capabilities of storage system

  •
    Efficient snapshots
  • Different types of storage available

  • Cloning for fast restore or scaling
Cinder volume creation

Cinder                   Cinder         volume              Glance
 API                     volume          driver            (templates)

   create image from X
                             locate X
                                                  location of X

                             read X


                                           X
                                                  flexibility in where VM
                                                  images are stored


                                           X'
               reference to X'



                                                                            31
Efficient volume creation

Cinder                   Cinder                   volume              Glance
 API                     volume                    driver            (templates)

   create image from X
                             locate X
                                                            location of X

                             clone X to X'


                                                  X
                                                             fast CoW clone

                                                  X'
                                    X' complete
           reference to X'



                                                                                   32
54

         What's new in Bobtail:
         Improved OSD threading
     •   Filesystem and journal related-locks are now
         more fine-grained
     •   Boosted single disk IOPS from 6k to 22k
     •   Restructured how map updates are handled,
         letting each placement group process them
         independently
55

         What's new in Bobtail:
         Recovery QoS

     •   Message priority system reworked to prevent
         starvation
     •   Recovery operations can be lower priority
         than client I/O without starving
     •   Requests to access an object can increase
         recovery priority for that object
56

         What's new in Bobtail:
         Block Device Cloning

     •   Instantly create new volumes based on
         templates (snapshots)
     •   Integrated with Cinder in Folsom
     •   Grizzly adds the ability to copy (not clone)
         non-raw images to RBD
57

         What's new in Bobtail:
         Keystone Integration

     •   RADOS gateway can talk to keystone to
         authenticate swift api requests
     •   Let keystone manage your users
     •   Supported by the Ceph juju charm
58



         What's next: Cuttlefish

     •   Incremental backup for block devices
     •   On-disk encryption
     •   REST management API for RADOS gateway
     •   More performance improvements (especially
         for small I/O)
     •   More! (http://www.inktank.com/about-
         inktank/roadmap/)
59



         What's next: Dumpling

     •   Geo-replication for RADOS gateway
     •   REST management API for Ceph cluster
     •   ...

         (virtual) Ceph Developer Summit May 6
Questions?

Josh Durgin
josh.durgin@inktank.com
jdurgin on freenode

inktank.com | ceph.com

More Related Content

PDF
Ceph LISA'12 Presentation
PDF
Ceph Day Nov 2012 - Sage Weil
PDF
(Free and Net) BSD Xen Roadmap
PDF
Software Defined Data Centers - June 2012
PDF
Cisco 刘洋 从“路由”回归“交换”
PPT
Slide
PDF
Mpls co s
PDF
Simple layouts for ECKD and zfcp disk configurations on Linux on System z
Ceph LISA'12 Presentation
Ceph Day Nov 2012 - Sage Weil
(Free and Net) BSD Xen Roadmap
Software Defined Data Centers - June 2012
Cisco 刘洋 从“路由”回归“交换”
Slide
Mpls co s
Simple layouts for ECKD and zfcp disk configurations on Linux on System z

Similar to New features for Ceph with Cinder and Beyond (20)

PDF
Openstack with ceph
PDF
Storing VMs with Cinder and Ceph RBD.pdf
ODP
London Ceph Day: The Future of CephFS
ODP
Ceph Day NYC: The Future of CephFS
PDF
XenSummit - 08/28/2012
ODP
Block Storage For VMs With Ceph
PPTX
Ceph Day NYC: Ceph Fundamentals
PDF
Storage Developer Conference - 09/19/2012
PDF
Ceph Day London 2014 - Ceph Ecosystem Overview
ODP
Ceph Day Santa Clara: The Future of CephFS + Developing with Librados
PDF
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red Hat
PDF
Ceph - Desmistificando Software-Define Storage
PDF
Ceph - A distributed storage system
PPTX
Inktank:ceph overview
PPTX
Ceph Intro and Architectural Overview by Ross Turk
PDF
Ceph Overview for Distributed Computing Denver Meetup
PDF
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
PPTX
Ceph Day Santa Clara: Ceph Fundamentals
PDF
Cache Tiering and Erasure Coding
PDF
Cache Tiering and Erasure Coding
Openstack with ceph
Storing VMs with Cinder and Ceph RBD.pdf
London Ceph Day: The Future of CephFS
Ceph Day NYC: The Future of CephFS
XenSummit - 08/28/2012
Block Storage For VMs With Ceph
Ceph Day NYC: Ceph Fundamentals
Storage Developer Conference - 09/19/2012
Ceph Day London 2014 - Ceph Ecosystem Overview
Ceph Day Santa Clara: The Future of CephFS + Developing with Librados
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red Hat
Ceph - Desmistificando Software-Define Storage
Ceph - A distributed storage system
Inktank:ceph overview
Ceph Intro and Architectural Overview by Ross Turk
Ceph Overview for Distributed Computing Denver Meetup
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Ceph Day Santa Clara: Ceph Fundamentals
Cache Tiering and Erasure Coding
Cache Tiering and Erasure Coding
Ad

Recently uploaded (20)

PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
Tartificialntelligence_presentation.pptx
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
TLE Review Electricity (Electricity).pptx
PDF
A novel scalable deep ensemble learning framework for big data classification...
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPTX
The various Industrial Revolutions .pptx
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Group 1 Presentation -Planning and Decision Making .pptx
Tartificialntelligence_presentation.pptx
Developing a website for English-speaking practice to English as a foreign la...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
TLE Review Electricity (Electricity).pptx
A novel scalable deep ensemble learning framework for big data classification...
O2C Customer Invoices to Receipt V15A.pptx
WOOl fibre morphology and structure.pdf for textiles
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Univ-Connecticut-ChatGPT-Presentaion.pdf
Zenith AI: Advanced Artificial Intelligence
A contest of sentiment analysis: k-nearest neighbor versus neural network
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Hindi spoken digit analysis for native and non-native speakers
A comparative study of natural language inference in Swahili using monolingua...
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Final SEM Unit 1 for mit wpu at pune .pptx
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
The various Industrial Revolutions .pptx
Ad

New features for Ceph with Cinder and Beyond

  • 1. New Features for Ceph with Cinder and Beyond
  • 2. 73
  • 3. 60 Why Ceph? • Low cost • Flexible • Scalable • Open source
  • 4. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHP RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes 79
  • 5. OSD OSD OSD OSD OSD btrfs FS FS FS FS FS xfs ext4 DISK DISK DISK DISK DISK M M M 81
  • 6. HUMAN M M M 82
  • 7. Monitors: M •  Maintain cluster map •  Provide consensus for distributed decision-making •  Must have an odd number •  These do not serve stored objects to clients OSDs: •  One per disk (recommended) •  At least three in a cluster •  Serve stored objects to clients •  Intelligently peer to perform replication tasks •  Supports object classes 83
  • 8. C D C D C D C D C D APP C D C D C D C D C D C D C D
  • 9. C D C D C D C D C D APP C D C D C D C D C D C D C D
  • 10. C D C D A-G C D C D C D H-N F APP * C D C D C D O-T C D C D C D U-Z C D
  • 11. 10 10 01 01 10 10 01 11 01 10 hash(object name) % num pg 10 10 01 01 10 10 01 11 01 10 CRUSH(pg, cluster state, rule set) 107
  • 12. 10 10 01 01 10 10 01 11 01 10 10 10 01 01 10 10 01 11 01 10 108
  • 13. CRUSH •  Pseudo-random placement algorithm •  Ensures even distribution •  Repeatable, deterministic •  Rule-based configuration •  Replica count •  Infrastructure topology •  Weighting 109
  • 14. CLIENT ?? 110
  • 15. 112
  • 16. CLIENT ?? 113
  • 17. 111
  • 18. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHP RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes 84
  • 19. APP LIBRADOS native M M M 85
  • 20. LIBRADOS L •  Provides direct access to RADOS for applications •  C, C++, Python, PHP, Java •  No HTTP overhead
  • 21. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHP RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes 87
  • 22. APP APP REST RADOSGW RADOSGW LIBRADOS LIBRADOS native M M M 88
  • 23. RADOS Gateway: •  REST-based interface to RADOS •  Supports buckets, accounting •  Compatible with S3 and Swift applications 89
  • 24. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHP RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes 90
  • 25. VM VIRTUALIZATION CONTAINER LIBRBD LIBRADOS M M M 91
  • 26. CONTAINER VM CONTAINER LIBRBD LIBRBD LIBRADOS LIBRADOS M M M 92
  • 27. HOST KRBD (KERNEL MODULE) LIBRADOS M M M 93
  • 28. RADOS Block Device: • Storage of virtual disks in RADOS • Allows decoupling of VMs and containers • Live migration! • Images are striped across the cluster • Thin-provisioning • Snapshots and cloning
  • 29. VM VIRTUALIZATION CONTAINER LIBRBD LIBRADOS M M M 115
  • 30. HOW DO YOU SPIN UP THOUSANDS OF VMs INSTANTLY AND EFFICIENTLY? 116
  • 31. instant copy 144 0 0 0 0 = 144 117
  • 32. write CLIENT write write write 144 4 = 148 118
  • 33. read read CLIENT read 144 4 = 148 119
  • 34. old-style VM image creation local disk Nova Glance (VM images) compute (templates) read X ● ephemeral ● expensive to create X X' 29
  • 35. Why use block storage? • Persistent • More familiar to users • Not tied to a single host • Decouples compute and storage • Enables Live migration • Extra capabilities of storage system • Efficient snapshots • Different types of storage available • Cloning for fast restore or scaling
  • 36. Cinder volume creation Cinder Cinder volume Glance API volume driver (templates) create image from X locate X location of X read X X flexibility in where VM images are stored X' reference to X' 31
  • 37. Efficient volume creation Cinder Cinder volume Glance API volume driver (templates) create image from X locate X location of X clone X to X' X fast CoW clone X' X' complete reference to X' 32
  • 38. 54 What's new in Bobtail: Improved OSD threading • Filesystem and journal related-locks are now more fine-grained • Boosted single disk IOPS from 6k to 22k • Restructured how map updates are handled, letting each placement group process them independently
  • 39. 55 What's new in Bobtail: Recovery QoS • Message priority system reworked to prevent starvation • Recovery operations can be lower priority than client I/O without starving • Requests to access an object can increase recovery priority for that object
  • 40. 56 What's new in Bobtail: Block Device Cloning • Instantly create new volumes based on templates (snapshots) • Integrated with Cinder in Folsom • Grizzly adds the ability to copy (not clone) non-raw images to RBD
  • 41. 57 What's new in Bobtail: Keystone Integration • RADOS gateway can talk to keystone to authenticate swift api requests • Let keystone manage your users • Supported by the Ceph juju charm
  • 42. 58 What's next: Cuttlefish • Incremental backup for block devices • On-disk encryption • REST management API for RADOS gateway • More performance improvements (especially for small I/O) • More! (http://www.inktank.com/about- inktank/roadmap/)
  • 43. 59 What's next: Dumpling • Geo-replication for RADOS gateway • REST management API for Ceph cluster • ... (virtual) Ceph Developer Summit May 6