Enabling ceph-mgr to control Ceph services via Kubernetes

Enabling ceph-mgr to control Ceph
services via Kubernetes
28 August 2018
Travis Nielsen
Rook
tnielsen@redhat.com
John Spray
Ceph Mgr
jspray@redhat.com

2
Ceph operations today
●
RPM packages (all daemons on server same version)
●
Physical services configured by external orchestrator:
– Ansible, salt, etc
●
Logical entities configured via Ceph itself (pools,
filesystems, auth):
– CLI, mgr module interface, restful module
– Separate workflow from the physical deployment
●
Plus some external monitoring to make sure your
services stay up

3
Pain points
●
All those elements combine to create a high surface area
between users and the software.
●
Lots of human decision making, opportunities for mistakes
●
In practice, deployments often kept relatively static after initial
decision making is done.
Can new container environments enable something better?

4
The solution: container orchestration
●
Kubernetes implements the basic operations that we need for
the management of cluster services
– Deploy builds (in container format)
– Detect devices, start container in specific location (OSD)
– Schedule/place groups of services (MDS, RGW)
●
If we were writing a Ceph management server/agent today, it
would look much like Kubernetes: so let’s just use Kubernetes
●
Kubernetes gives us the primitives
●
We still need the business logic and UI

5
Why Kubernetes?
●
Widely adopted (Red Hat OpenShift, Google Compute
Engine, Amazon EKS, etc.)
●
CLI/REST driven (extensible API)
●
Lightweight design

7
Rook
●
Simplified, container-native way of consuming Ceph
●
Built for Kubernetes, extending the Kubernetes API
●
CNCF sandbox project (proposed for incubation)
http://rook.io/
http://github.com/rook/rook

8
Rook components
●
Docker Image: Ceph and Rook binaries in one artifact
– In Rook 0.9, these will be decoupled
●
The Agent handles mounting volumes
– Hide complexity of client version, kernel version variations
●
The Operator watches objects in K8s, manipulates Ceph
in response
– Create a “Filesystem” object, Rook operator does corresponding
“ceph fs new”

9
Rook example
$ kubectl create -f operator.yaml
$ kubectl create -f cluster.yaml
$ kubectl -n rook-ceph get pod
NAME READY STATUS
rook-ceph-mgr-a-9c44495df-jpfvb 1/1 Running
rook-ceph-mon0-zz8l2 1/1 Running
rook-ceph-mon1-rltcp 1/1 Running
rook-ceph-mon2-lxl9x 1/1 Running
rook-ceph-osd-id-0-76f5696669-d9gwj 1/1 Running
rook-ceph-osd-id-1-5d8477d8f4-kq7n5 1/1 Running
rook-ceph-osd-prepare-minikube-tj69w 0/1 Completed
apiVersion: ceph.rook.io/v1beta1
kind: Cluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
mon:
count: 3
network:
hostNetwork: false
storage:
useAllNodes: true
useAllDevices: true
config:
storeType: bluestore

10
Rook user interface
●
Rook objects are created via the extensible Kubernetes
API service (Custom Resource Definitions)
●
kubectl + yaml files
– This style is consistent with Kubernetes ecosystem
●
Point and Click is desirable for many users (& vendors)
– Deleting a pool should require a confirmation button!

12
“Just give me the storage”
●
Rook’s simplified model is suitable for people who do not want to
pay any attention to how Ceph is configured: they just want to
see a volume attached to their container.
●
People buying hardware (or paying for cloud) often care a lot
about how the storage cluster is configured.
●
Lifecycle: over time users care more and more about optimizing
resource usage.

13
What is ceph-mgr?
●
Component of RADOS: a sibling of the mon and OSD
daemons. C++ code using same auth/networking stack.
●
Mandatory component: includes key functionality
●
Host to python modules that do monitoring/management
●
Relatively simple in itself: the fun parts are the python
modules.

14
Dashboard module
● Mimic (13.2.x) release includes an extended management
web UI based on OpenAttic
●
Would like Kubernetes integration, so that we can create
containers from the dashboard too:
– The “Create Filesystem” button starts MDS cluster
– A “Create OSD” button that starts OSDs
→ Call out to Rook from ceph-mgr
(and to other orchestrators too)

15
Three ways to consume containerized Ceph
Rook operator
K8s
Ceph-mgr dashboardRook user
kubectl
yaml files
point+click
Rook agent
Ceph CLI
Rook toolbox
All Ceph command line tools
Ceph

17
Rook automation vs Ceph-Mgr
Both Rook and ceph-mgr are managing the state of the cluster
●
ceph-mgr creates pools and Filesystem object
●
Rook creates the MDS container
– Pools and the file system are skipped by Rook
●
Rook settings can change modes as needed:
– Full management: Pure Rook
– Partial management: Shared mgmt with the dashboard

18
Why not build Rook-like functionality into mgr?
1. Upgrades!
– An external component needs to orchestrate the Ceph upgrade, while other
Ceph services may be offline (aka “who manages the manager?”)
2. Commonality between simplified pure-Rook systems and
fully-featured containerized Ceph clusters.
3. Rook’s client mounting/volume management

19
What Kubernetes doesn’t do for us
●
Install itself
●
Configure the underlying network
●
Bootstrap Rook
→ External setup tools will continue to have a role in the non-Ceph-
specific tasks

21
Orchestrator modules
●
Would like to drive tasks like creating OSDs from the dashboard
●
Ceph users use various different orchestrators:
– Ansible (ceph-ansible)
– SaltStack (DeepSea)
– Kubernetes (Rook).
●
Abstraction layer: the orchestrator interface

22
Orchestrator module Interface
Subclass of MgrModule: specialized ceph-mgr modules that
implement a set of service orchestration primitives:
●
Get device inventory
●
Create OSDs
●
Start/stop/scale stateless services (MDS, RGW, etc)

23
rook module
●
Implement Orchestrator interface using Kubernetes API
client, mapping operations to Rook’s structures:
– Device inventory → read ConfigMaps populated by Rook
– OSD creation → add entries to cluster→nodes→devices
– MDS creation → create FilesystemSpec entities
– RGW creation → create ObjectStoreSpec entities
●
Some extra code to implement clean completions/progress
events, e.g. not reporting OSD creation complete until OSD is
actually up in OSDMap.
New in Nautilus

24
Orchestration vs Ceph Management
●
External orchestrators are handling physical deployment of
services, but most logical management is still direct to Ceph
●
We must continue to make managing Ceph easier, and where
possible, remove need for intervention.
●
Ceph-mgr modules fill this role
●
Orchestrator should orchestrate, Ceph modules should manage

25
Orchestrator Simplification
Orchestrators mix physically deploying Ceph services with logical
configuration:
●
Rook creates volumes as CephFS filesystems, but this means
creating underlying pools. How does it know how to configure
them?
●
Same for anything deploying RGW
●
Rook also exposes some health/monitoring of the Ceph cluster,
but is this in terms a non-Ceph-expert can understand?

27
Placement group merging
●
Historically, pg_num could be increased but not decreased
●
Sometimes problematic, such as when physically shrinking a
cluster, or if bad pg_nums were chosen.
●
Bigger problem: prevented automatic pg_num selection, because
mistakes could not be reversed.
●
Implementation is not simple, and doing it still has an IO cost, but
the option will be there → now we can autoselect pg_num!
Targeted for Nautilus

28
poolsets module
●
Pick pg_num so the user doesn’t have to!
●
Hard (impossible?) to do perfectly, but...
●
Pretty easy to do useful common cases:
– Select initial pg_nums according to expected space use
– Increase pg_nums if actual space use has gone ~2x over ideal PG capacity
– Decrease pg_num for underused pools if another pool needs to increase
theirs
●
Not an optimizer! But does the job as well as most human
beings are doing it today.

29
poolsets module
Prompting users for expected capacity makes sense for data pools,
but not for metadata pools:
●
Combine data and metadata pool creation into one command
●
Wrap pools into new “poolset” structure describing policy
●
Auto-construct poolsets for existing deployments, but don’t auto-
adjust unless explicitly enabled
ceph poolset create cephfs my_filesystem 100GB
New in Nautilus

30
progress module
● Health reporting was improved in luminous, but in many cases it is
still too low level.
●
Placement groups:
– Hard to distinguish between real problems and normal rebalancing
– Once we start auto-picking pg_num, users won’t know what a PG is until
they see them in the health status
●
Introduce `progress` module to synthesize high level view from
PG state: “56% recovered from failure of OSD 123”
●
Also enable other modules to describe their long running
operations via this module (creating an OSD, etc)
New in Nautilus

31
volumes module
●
Currently, ceph_volume_client.py (Used by Manila, etc) creates
“volumes” within CephFS “filesystems”, which require RADOS
“pools” and provisioning of MDS daemons.
●
Simplify this:
– Two concepts: Volumes (aka filesystems), and Subvolumes
– Automatically provision MDS daemons on demand using Rook
– Automatically create pools on demand using `poolsets` module
– Expose functionality as commands (consumable via librados) instead of
library
– Run background tasks from ceph-mgr (e.g. subvolume purge)

32
From zero to working subvolume
Before
ceph osd pool create metadata 128
ceph osd pool create data 2048
ceph fs new myfs metadata data
# Create an MDS somehow…
# Call into ceph_volume_client.py
# volume_client.create_volume(…
After
ceph fs volume create myvol
ceph fs subvolume create myvol subv
Done!

33
Wrap up
●
All these improvements reduce cognitive load on ordinary
user.
– Do not need to know what an MDS is: ask Rook for a filesystem, and get
one.
– Do not need to know what a placement group is
– Do not need to know magic commands: look at the dashboard
●
Actions that no longer require human thought can now be tied
into automated workflows: fulfill the promise of software
defined storage.
●
A smart container orchestrator is an essential part of this vision:
on-demand Ceph requires on-demand service orchestration.

34
Resources
●
Rook
– https://rook.io
– https://github.com/rook/rook
●
Ceph Mgr
– http://docs.ceph.com/docs/master/mgr/plugins/
Contributions welcome!

Enabling ceph-mgr to control Ceph services via Kubernetes

More Related Content

What's hot (20)

Similar to Enabling ceph-mgr to control Ceph services via Kubernetes (20)

Recently uploaded (20)

Enabling ceph-mgr to control Ceph services via Kubernetes

Editor's Notes