SlideShare a Scribd company logo
Enabling ceph-mgr to control Ceph
services via Kubernetes
28 August 2018
Travis Nielsen
Rook
tnielsen@redhat.com
John Spray
Ceph Mgr
jspray@redhat.com
2
Ceph operations today
●
RPM packages (all daemons on server same version)
●
Physical services configured by external orchestrator:
– Ansible, salt, etc
●
Logical entities configured via Ceph itself (pools,
filesystems, auth):
– CLI, mgr module interface, restful module
– Separate workflow from the physical deployment
●
Plus some external monitoring to make sure your
services stay up
3
Pain points
●
All those elements combine to create a high surface area
between users and the software.
●
Lots of human decision making, opportunities for mistakes
●
In practice, deployments often kept relatively static after initial
decision making is done.
Can new container environments enable something better?
4
The solution: container orchestration
●
Kubernetes implements the basic operations that we need for
the management of cluster services
– Deploy builds (in container format)
– Detect devices, start container in specific location (OSD)
– Schedule/place groups of services (MDS, RGW)
●
If we were writing a Ceph management server/agent today, it
would look much like Kubernetes: so let’s just use Kubernetes
●
Kubernetes gives us the primitives
●
We still need the business logic and UI
5
Why Kubernetes?
●
Widely adopted (Red Hat OpenShift, Google Compute
Engine, Amazon EKS, etc.)
●
CLI/REST driven (extensible API)
●
Lightweight design
Rook
7
Rook
●
Simplified, container-native way of consuming Ceph
●
Built for Kubernetes, extending the Kubernetes API
●
CNCF sandbox project (proposed for incubation)
http://rook.io/
http://github.com/rook/rook
8
Rook components
●
Docker Image: Ceph and Rook binaries in one artifact
– In Rook 0.9, these will be decoupled
●
The Agent handles mounting volumes
– Hide complexity of client version, kernel version variations
●
The Operator watches objects in K8s, manipulates Ceph
in response
– Create a “Filesystem” object, Rook operator does corresponding
“ceph fs new”
9
Rook example
$ kubectl create -f operator.yaml
$ kubectl create -f cluster.yaml
$ kubectl -n rook-ceph get pod
NAME READY STATUS
rook-ceph-mgr-a-9c44495df-jpfvb 1/1 Running
rook-ceph-mon0-zz8l2 1/1 Running
rook-ceph-mon1-rltcp 1/1 Running
rook-ceph-mon2-lxl9x 1/1 Running
rook-ceph-osd-id-0-76f5696669-d9gwj 1/1 Running
rook-ceph-osd-id-1-5d8477d8f4-kq7n5 1/1 Running
rook-ceph-osd-prepare-minikube-tj69w 0/1 Completed
apiVersion: ceph.rook.io/v1beta1
kind: Cluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
mon:
count: 3
network:
hostNetwork: false
storage:
useAllNodes: true
useAllDevices: true
config:
storeType: bluestore
10
Rook user interface
●
Rook objects are created via the extensible Kubernetes
API service (Custom Resource Definitions)
●
kubectl + yaml files
– This style is consistent with Kubernetes ecosystem
●
Point and Click is desirable for many users (& vendors)
– Deleting a pool should require a confirmation button!
Combining Rook with ceph-mgr
12
“Just give me the storage”
●
Rook’s simplified model is suitable for people who do not want to
pay any attention to how Ceph is configured: they just want to
see a volume attached to their container.
●
People buying hardware (or paying for cloud) often care a lot
about how the storage cluster is configured.
●
Lifecycle: over time users care more and more about optimizing
resource usage.
13
What is ceph-mgr?
●
Component of RADOS: a sibling of the mon and OSD
daemons. C++ code using same auth/networking stack.
●
Mandatory component: includes key functionality
●
Host to python modules that do monitoring/management
●
Relatively simple in itself: the fun parts are the python
modules.
14
Dashboard module
● Mimic (13.2.x) release includes an extended management
web UI based on OpenAttic
●
Would like Kubernetes integration, so that we can create
containers from the dashboard too:
– The “Create Filesystem” button starts MDS cluster
– A “Create OSD” button that starts OSDs
→ Call out to Rook from ceph-mgr
(and to other orchestrators too)
15
Three ways to consume containerized Ceph
Rook operator
K8s
Ceph-mgr dashboardRook user
kubectl
yaml files
point+click
Rook agent
Ceph CLI
Rook toolbox
All Ceph command line tools
Ceph
Demo
Rook + Mimic Dashboard
17
Rook automation vs Ceph-Mgr
Both Rook and ceph-mgr are managing the state of the cluster
●
ceph-mgr creates pools and Filesystem object
●
Rook creates the MDS container
– Pools and the file system are skipped by Rook
●
Rook settings can change modes as needed:
– Full management: Pure Rook
– Partial management: Shared mgmt with the dashboard
18
Why not build Rook-like functionality into mgr?
1. Upgrades!
– An external component needs to orchestrate the Ceph upgrade, while other
Ceph services may be offline (aka “who manages the manager?”)
2. Commonality between simplified pure-Rook systems and
fully-featured containerized Ceph clusters.
3. Rook’s client mounting/volume management
19
What Kubernetes doesn’t do for us
●
Install itself
●
Configure the underlying network
●
Bootstrap Rook
→ External setup tools will continue to have a role in the non-Ceph-
specific tasks
Ceph orchestrator modules
21
Orchestrator modules
●
Would like to drive tasks like creating OSDs from the dashboard
●
Ceph users use various different orchestrators:
– Ansible (ceph-ansible)
– SaltStack (DeepSea)
– Kubernetes (Rook).
●
Abstraction layer: the orchestrator interface
22
Orchestrator module Interface
Subclass of MgrModule: specialized ceph-mgr modules that
implement a set of service orchestration primitives:
●
Get device inventory
●
Create OSDs
●
Start/stop/scale stateless services (MDS, RGW, etc)
23
rook module
●
Implement Orchestrator interface using Kubernetes API
client, mapping operations to Rook’s structures:
– Device inventory → read ConfigMaps populated by Rook
– OSD creation → add entries to cluster→nodes→devices
– MDS creation → create FilesystemSpec entities
– RGW creation → create ObjectStoreSpec entities
●
Some extra code to implement clean completions/progress
events, e.g. not reporting OSD creation complete until OSD is
actually up in OSDMap.
New in Nautilus
24
Orchestration vs Ceph Management
●
External orchestrators are handling physical deployment of
services, but most logical management is still direct to Ceph
●
We must continue to make managing Ceph easier, and where
possible, remove need for intervention.
●
Ceph-mgr modules fill this role
●
Orchestrator should orchestrate, Ceph modules should manage
25
Orchestrator Simplification
Orchestrators mix physically deploying Ceph services with logical
configuration:
●
Rook creates volumes as CephFS filesystems, but this means
creating underlying pools. How does it know how to configure
them?
●
Same for anything deploying RGW
●
Rook also exposes some health/monitoring of the Ceph cluster,
but is this in terms a non-Ceph-expert can understand?
Ceph management modules
27
Placement group merging
●
Historically, pg_num could be increased but not decreased
●
Sometimes problematic, such as when physically shrinking a
cluster, or if bad pg_nums were chosen.
●
Bigger problem: prevented automatic pg_num selection, because
mistakes could not be reversed.
●
Implementation is not simple, and doing it still has an IO cost, but
the option will be there → now we can autoselect pg_num!
Targeted for Nautilus
28
poolsets module
●
Pick pg_num so the user doesn’t have to!
●
Hard (impossible?) to do perfectly, but...
●
Pretty easy to do useful common cases:
– Select initial pg_nums according to expected space use
– Increase pg_nums if actual space use has gone ~2x over ideal PG capacity
– Decrease pg_num for underused pools if another pool needs to increase
theirs
●
Not an optimizer! But does the job as well as most human
beings are doing it today.
Targeted for Nautilus
29
poolsets module
Prompting users for expected capacity makes sense for data pools,
but not for metadata pools:
●
Combine data and metadata pool creation into one command
●
Wrap pools into new “poolset” structure describing policy
●
Auto-construct poolsets for existing deployments, but don’t auto-
adjust unless explicitly enabled
ceph poolset create cephfs my_filesystem 100GB
New in Nautilus
30
progress module
● Health reporting was improved in luminous, but in many cases it is
still too low level.
●
Placement groups:
– Hard to distinguish between real problems and normal rebalancing
– Once we start auto-picking pg_num, users won’t know what a PG is until
they see them in the health status
●
Introduce `progress` module to synthesize high level view from
PG state: “56% recovered from failure of OSD 123”
●
Also enable other modules to describe their long running
operations via this module (creating an OSD, etc)
New in Nautilus
31
volumes module
●
Currently, ceph_volume_client.py (Used by Manila, etc) creates
“volumes” within CephFS “filesystems”, which require RADOS
“pools” and provisioning of MDS daemons.
●
Simplify this:
– Two concepts: Volumes (aka filesystems), and Subvolumes
– Automatically provision MDS daemons on demand using Rook
– Automatically create pools on demand using `poolsets` module
– Expose functionality as commands (consumable via librados) instead of
library
– Run background tasks from ceph-mgr (e.g. subvolume purge)
Targeted for Nautilus
32
From zero to working subvolume
Before
ceph osd pool create metadata 128
ceph osd pool create data 2048
ceph fs new myfs metadata data
# Create an MDS somehow…
# Call into ceph_volume_client.py
# volume_client.create_volume(…
After
ceph fs volume create myvol
ceph fs subvolume create myvol subv
Done!
33
Wrap up
●
All these improvements reduce cognitive load on ordinary
user.
– Do not need to know what an MDS is: ask Rook for a filesystem, and get
one.
– Do not need to know what a placement group is
– Do not need to know magic commands: look at the dashboard
●
Actions that no longer require human thought can now be tied
into automated workflows: fulfill the promise of software
defined storage.
●
A smart container orchestrator is an essential part of this vision:
on-demand Ceph requires on-demand service orchestration.
34
Resources
●
Rook
– https://rook.io
– https://github.com/rook/rook
●
Ceph Mgr
– http://docs.ceph.com/docs/master/mgr/plugins/
Contributions welcome!
Q&A

More Related Content

PDF
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
PDF
Kubernetes Ingress 101
PDF
Helm - Package Manager for Kubernetes
PDF
Social Connections 14 - Kubernetes Basics for Connections Admins
PDF
Openstack days sv building highly available services using kubernetes (preso)
PDF
From Code to Kubernetes
PDF
MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
PDF
An Architectural Deep Dive With Kubernetes And Containers Powerpoint Presenta...
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
Kubernetes Ingress 101
Helm - Package Manager for Kubernetes
Social Connections 14 - Kubernetes Basics for Connections Admins
Openstack days sv building highly available services using kubernetes (preso)
From Code to Kubernetes
MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
An Architectural Deep Dive With Kubernetes And Containers Powerpoint Presenta...

What's hot (20)

PDF
Running and Managing Kubernetes on OpenStack
PDF
Effective Kubernetes - Is Kubernetes the new Linux? Is the new Application Se...
PDF
Kubernetes Monitoring & Best Practices
PPTX
Securing and Automating Kubernetes with Kyverno
PDF
Kubernetes persistence 101
PDF
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
PDF
Effective Building your Platform with Kubernetes == Keep it Simple
PPT
Building Clustered Applications with Kubernetes and Docker
PPTX
Building Portable Applications with Kubernetes
PDF
Top 3 reasons why you should run your Enterprise workloads on GKE
PPTX
Introduction to Kubernetes
PPTX
A Million ways of Deploying a Kubernetes Cluster
PDF
Kubernetes Architecture and Introduction
PDF
OpenStack on Kubernetes (BOS Summit / May 2017 update)
PDF
Orchestrating Microservices with Kubernetes
PDF
KubeCon EU 2016: Heroku to Kubernetes
PPTX
K8s best practices from the field!
PDF
K8scale update-kubecon2015
PPTX
Kubernetes and Istio
PDF
Architecture of Cisco Container Platform: A new Enterprise Multi-Cloud Kubern...
Running and Managing Kubernetes on OpenStack
Effective Kubernetes - Is Kubernetes the new Linux? Is the new Application Se...
Kubernetes Monitoring & Best Practices
Securing and Automating Kubernetes with Kyverno
Kubernetes persistence 101
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Effective Building your Platform with Kubernetes == Keep it Simple
Building Clustered Applications with Kubernetes and Docker
Building Portable Applications with Kubernetes
Top 3 reasons why you should run your Enterprise workloads on GKE
Introduction to Kubernetes
A Million ways of Deploying a Kubernetes Cluster
Kubernetes Architecture and Introduction
OpenStack on Kubernetes (BOS Summit / May 2017 update)
Orchestrating Microservices with Kubernetes
KubeCon EU 2016: Heroku to Kubernetes
K8s best practices from the field!
K8scale update-kubecon2015
Kubernetes and Istio
Architecture of Cisco Container Platform: A new Enterprise Multi-Cloud Kubern...
Ad

Similar to Enabling ceph-mgr to control Ceph services via Kubernetes (20)

PDF
John Spray - Ceph in Kubernetes
PDF
SCM Puppet: from an intro to the scaling
PDF
Automation@Brainly - Polish Linux Autumn 2014
PPTX
First steps on CentOs7
PDF
Linux Foundation Mentorship Sessions - Kernel Livepatch: An Introduction
PDF
PLNOG14: Automation at Brainly - Paweł Rozlach
PDF
PLNOG Automation@Brainly
PPTX
Kubernetes 101
PDF
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
PDF
PaaSTA: Autoscaling at Yelp
PDF
Advanced Namespaces and cgroups
PDF
TechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
PDF
#OktoCampus - Workshop : An introduction to Ansible
PDF
Kubernetes Walk Through from Technical View
PPTX
Hadoop administration
PDF
Deploying PostgreSQL on Kubernetes
PDF
Making distributed storage easy: usability in Ceph Luminous and beyond
PDF
Red Hat Satellite 6 - Automation with Puppet
PDF
Open Dayligth usando SDN-NFV
PPTX
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
John Spray - Ceph in Kubernetes
SCM Puppet: from an intro to the scaling
Automation@Brainly - Polish Linux Autumn 2014
First steps on CentOs7
Linux Foundation Mentorship Sessions - Kernel Livepatch: An Introduction
PLNOG14: Automation at Brainly - Paweł Rozlach
PLNOG Automation@Brainly
Kubernetes 101
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
PaaSTA: Autoscaling at Yelp
Advanced Namespaces and cgroups
TechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
#OktoCampus - Workshop : An introduction to Ansible
Kubernetes Walk Through from Technical View
Hadoop administration
Deploying PostgreSQL on Kubernetes
Making distributed storage easy: usability in Ceph Luminous and beyond
Red Hat Satellite 6 - Automation with Puppet
Open Dayligth usando SDN-NFV
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
Ad

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Electronic commerce courselecture one. Pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Machine learning based COVID-19 study performance prediction
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
KodekX | Application Modernization Development
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
cuic standard and advanced reporting.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPT
Teaching material agriculture food technology
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Big Data Technologies - Introduction.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Electronic commerce courselecture one. Pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Spectral efficient network and resource selection model in 5G networks
Machine learning based COVID-19 study performance prediction
Review of recent advances in non-invasive hemoglobin estimation
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
KodekX | Application Modernization Development
NewMind AI Monthly Chronicles - July 2025
cuic standard and advanced reporting.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
The Rise and Fall of 3GPP – Time for a Sabbatical?
Teaching material agriculture food technology
NewMind AI Weekly Chronicles - August'25 Week I
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Digital-Transformation-Roadmap-for-Companies.pptx
Empathic Computing: Creating Shared Understanding
Agricultural_Statistics_at_a_Glance_2022_0.pdf

Enabling ceph-mgr to control Ceph services via Kubernetes

  • 1. Enabling ceph-mgr to control Ceph services via Kubernetes 28 August 2018 Travis Nielsen Rook tnielsen@redhat.com John Spray Ceph Mgr jspray@redhat.com
  • 2. 2 Ceph operations today ● RPM packages (all daemons on server same version) ● Physical services configured by external orchestrator: – Ansible, salt, etc ● Logical entities configured via Ceph itself (pools, filesystems, auth): – CLI, mgr module interface, restful module – Separate workflow from the physical deployment ● Plus some external monitoring to make sure your services stay up
  • 3. 3 Pain points ● All those elements combine to create a high surface area between users and the software. ● Lots of human decision making, opportunities for mistakes ● In practice, deployments often kept relatively static after initial decision making is done. Can new container environments enable something better?
  • 4. 4 The solution: container orchestration ● Kubernetes implements the basic operations that we need for the management of cluster services – Deploy builds (in container format) – Detect devices, start container in specific location (OSD) – Schedule/place groups of services (MDS, RGW) ● If we were writing a Ceph management server/agent today, it would look much like Kubernetes: so let’s just use Kubernetes ● Kubernetes gives us the primitives ● We still need the business logic and UI
  • 5. 5 Why Kubernetes? ● Widely adopted (Red Hat OpenShift, Google Compute Engine, Amazon EKS, etc.) ● CLI/REST driven (extensible API) ● Lightweight design
  • 7. 7 Rook ● Simplified, container-native way of consuming Ceph ● Built for Kubernetes, extending the Kubernetes API ● CNCF sandbox project (proposed for incubation) http://rook.io/ http://github.com/rook/rook
  • 8. 8 Rook components ● Docker Image: Ceph and Rook binaries in one artifact – In Rook 0.9, these will be decoupled ● The Agent handles mounting volumes – Hide complexity of client version, kernel version variations ● The Operator watches objects in K8s, manipulates Ceph in response – Create a “Filesystem” object, Rook operator does corresponding “ceph fs new”
  • 9. 9 Rook example $ kubectl create -f operator.yaml $ kubectl create -f cluster.yaml $ kubectl -n rook-ceph get pod NAME READY STATUS rook-ceph-mgr-a-9c44495df-jpfvb 1/1 Running rook-ceph-mon0-zz8l2 1/1 Running rook-ceph-mon1-rltcp 1/1 Running rook-ceph-mon2-lxl9x 1/1 Running rook-ceph-osd-id-0-76f5696669-d9gwj 1/1 Running rook-ceph-osd-id-1-5d8477d8f4-kq7n5 1/1 Running rook-ceph-osd-prepare-minikube-tj69w 0/1 Completed apiVersion: ceph.rook.io/v1beta1 kind: Cluster metadata: name: rook-ceph namespace: rook-ceph spec: mon: count: 3 network: hostNetwork: false storage: useAllNodes: true useAllDevices: true config: storeType: bluestore
  • 10. 10 Rook user interface ● Rook objects are created via the extensible Kubernetes API service (Custom Resource Definitions) ● kubectl + yaml files – This style is consistent with Kubernetes ecosystem ● Point and Click is desirable for many users (& vendors) – Deleting a pool should require a confirmation button!
  • 12. 12 “Just give me the storage” ● Rook’s simplified model is suitable for people who do not want to pay any attention to how Ceph is configured: they just want to see a volume attached to their container. ● People buying hardware (or paying for cloud) often care a lot about how the storage cluster is configured. ● Lifecycle: over time users care more and more about optimizing resource usage.
  • 13. 13 What is ceph-mgr? ● Component of RADOS: a sibling of the mon and OSD daemons. C++ code using same auth/networking stack. ● Mandatory component: includes key functionality ● Host to python modules that do monitoring/management ● Relatively simple in itself: the fun parts are the python modules.
  • 14. 14 Dashboard module ● Mimic (13.2.x) release includes an extended management web UI based on OpenAttic ● Would like Kubernetes integration, so that we can create containers from the dashboard too: – The “Create Filesystem” button starts MDS cluster – A “Create OSD” button that starts OSDs → Call out to Rook from ceph-mgr (and to other orchestrators too)
  • 15. 15 Three ways to consume containerized Ceph Rook operator K8s Ceph-mgr dashboardRook user kubectl yaml files point+click Rook agent Ceph CLI Rook toolbox All Ceph command line tools Ceph
  • 16. Demo Rook + Mimic Dashboard
  • 17. 17 Rook automation vs Ceph-Mgr Both Rook and ceph-mgr are managing the state of the cluster ● ceph-mgr creates pools and Filesystem object ● Rook creates the MDS container – Pools and the file system are skipped by Rook ● Rook settings can change modes as needed: – Full management: Pure Rook – Partial management: Shared mgmt with the dashboard
  • 18. 18 Why not build Rook-like functionality into mgr? 1. Upgrades! – An external component needs to orchestrate the Ceph upgrade, while other Ceph services may be offline (aka “who manages the manager?”) 2. Commonality between simplified pure-Rook systems and fully-featured containerized Ceph clusters. 3. Rook’s client mounting/volume management
  • 19. 19 What Kubernetes doesn’t do for us ● Install itself ● Configure the underlying network ● Bootstrap Rook → External setup tools will continue to have a role in the non-Ceph- specific tasks
  • 21. 21 Orchestrator modules ● Would like to drive tasks like creating OSDs from the dashboard ● Ceph users use various different orchestrators: – Ansible (ceph-ansible) – SaltStack (DeepSea) – Kubernetes (Rook). ● Abstraction layer: the orchestrator interface
  • 22. 22 Orchestrator module Interface Subclass of MgrModule: specialized ceph-mgr modules that implement a set of service orchestration primitives: ● Get device inventory ● Create OSDs ● Start/stop/scale stateless services (MDS, RGW, etc)
  • 23. 23 rook module ● Implement Orchestrator interface using Kubernetes API client, mapping operations to Rook’s structures: – Device inventory → read ConfigMaps populated by Rook – OSD creation → add entries to cluster→nodes→devices – MDS creation → create FilesystemSpec entities – RGW creation → create ObjectStoreSpec entities ● Some extra code to implement clean completions/progress events, e.g. not reporting OSD creation complete until OSD is actually up in OSDMap. New in Nautilus
  • 24. 24 Orchestration vs Ceph Management ● External orchestrators are handling physical deployment of services, but most logical management is still direct to Ceph ● We must continue to make managing Ceph easier, and where possible, remove need for intervention. ● Ceph-mgr modules fill this role ● Orchestrator should orchestrate, Ceph modules should manage
  • 25. 25 Orchestrator Simplification Orchestrators mix physically deploying Ceph services with logical configuration: ● Rook creates volumes as CephFS filesystems, but this means creating underlying pools. How does it know how to configure them? ● Same for anything deploying RGW ● Rook also exposes some health/monitoring of the Ceph cluster, but is this in terms a non-Ceph-expert can understand?
  • 27. 27 Placement group merging ● Historically, pg_num could be increased but not decreased ● Sometimes problematic, such as when physically shrinking a cluster, or if bad pg_nums were chosen. ● Bigger problem: prevented automatic pg_num selection, because mistakes could not be reversed. ● Implementation is not simple, and doing it still has an IO cost, but the option will be there → now we can autoselect pg_num! Targeted for Nautilus
  • 28. 28 poolsets module ● Pick pg_num so the user doesn’t have to! ● Hard (impossible?) to do perfectly, but... ● Pretty easy to do useful common cases: – Select initial pg_nums according to expected space use – Increase pg_nums if actual space use has gone ~2x over ideal PG capacity – Decrease pg_num for underused pools if another pool needs to increase theirs ● Not an optimizer! But does the job as well as most human beings are doing it today. Targeted for Nautilus
  • 29. 29 poolsets module Prompting users for expected capacity makes sense for data pools, but not for metadata pools: ● Combine data and metadata pool creation into one command ● Wrap pools into new “poolset” structure describing policy ● Auto-construct poolsets for existing deployments, but don’t auto- adjust unless explicitly enabled ceph poolset create cephfs my_filesystem 100GB New in Nautilus
  • 30. 30 progress module ● Health reporting was improved in luminous, but in many cases it is still too low level. ● Placement groups: – Hard to distinguish between real problems and normal rebalancing – Once we start auto-picking pg_num, users won’t know what a PG is until they see them in the health status ● Introduce `progress` module to synthesize high level view from PG state: “56% recovered from failure of OSD 123” ● Also enable other modules to describe their long running operations via this module (creating an OSD, etc) New in Nautilus
  • 31. 31 volumes module ● Currently, ceph_volume_client.py (Used by Manila, etc) creates “volumes” within CephFS “filesystems”, which require RADOS “pools” and provisioning of MDS daemons. ● Simplify this: – Two concepts: Volumes (aka filesystems), and Subvolumes – Automatically provision MDS daemons on demand using Rook – Automatically create pools on demand using `poolsets` module – Expose functionality as commands (consumable via librados) instead of library – Run background tasks from ceph-mgr (e.g. subvolume purge) Targeted for Nautilus
  • 32. 32 From zero to working subvolume Before ceph osd pool create metadata 128 ceph osd pool create data 2048 ceph fs new myfs metadata data # Create an MDS somehow… # Call into ceph_volume_client.py # volume_client.create_volume(… After ceph fs volume create myvol ceph fs subvolume create myvol subv Done!
  • 33. 33 Wrap up ● All these improvements reduce cognitive load on ordinary user. – Do not need to know what an MDS is: ask Rook for a filesystem, and get one. – Do not need to know what a placement group is – Do not need to know magic commands: look at the dashboard ● Actions that no longer require human thought can now be tied into automated workflows: fulfill the promise of software defined storage. ● A smart container orchestrator is an essential part of this vision: on-demand Ceph requires on-demand service orchestration.
  • 34. 34 Resources ● Rook – https://rook.io – https://github.com/rook/rook ● Ceph Mgr – http://docs.ceph.com/docs/master/mgr/plugins/ Contributions welcome!
  • 35. Q&A

Editor's Notes

  • #31: FAQ: this seems like it would struggle with corner cases like a PG failing on one OSD, then the new OSD failing too, do you end up with multiple progress bars or what? A: The code handles the simple cases well, and when things get complicated we just do the simplest thing we can. The fallback general case is to look at the overall recovery progress of the cluster, if it doesn’t break down neatly into individual progress events.
  • #32: FAQ: why was ceph_volume_client.py implemented externally to begin with? A: ceph_volume_client.py predates ceph-mgr, we’ve been wanting to integrate it for a while. FAQ: why rename the entities? A: Needed to distinguish between “lightweight volume” (subvolume) and “heavyweight volume” (volume). Term “filesystem” was prone to confusion, because at the point you mount a subvolume, that is a filesystem from the POV of the client node. FAQ: A separate MDS for each volume, that seems so resource intensive! A: That’s the point of subvolumes: you can choose when to deploy a full blown filesystem, and when to just carve out a logical partition in an existing one. By the way, MDS daemons are increasingly smart about how they manage memory, enabling running more daemons with less memory each, if your workload doesn’t require a single high-memory MDS with a big cache.
  • #33: Key idea: that this isn’t just a convenience to hide a few commands, we’re protecting the user from making decisions like PG counts (delegate decision to poolsets), and where to run an MDS daemon (delegate scheduling to k8s) Key idea: by implementing this functionality inside Ceph, we enable seamless integration with dashboard, etc. Nothing extra to install, nothing extra to configure.