SlideShare a Scribd company logo
KEEPING OPENSTACK STORAGE TRENDY
WITH CEPH AND CONTAINERS
SAGE WEIL, HAOMAI WANG
OPENSTACK SUMMIT - 2015.05.20
2
AGENDA
● Motivation
● Block
● File
● Container orchestration
● Summary
MOTIVATION
4
WEB APPLICATION
APP SERVER APP SERVER APP SERVER APP SERVER
A CLOUD SMORGASBORD
● Compelling clouds offer options
● Compute
– VM (KVM, Xen, …)
– Containers (lxc, Docker, OpenVZ, ...)
● Storage
– Block (virtual disk)
– File (shared)
– Object (RESTful, …)
– Key/value
– NoSQL
– SQL
5
WHY CONTAINERS?
Technology
● Performance
– Shared kernel
– Faster boot
– Lower baseline overhead
– Better resource sharing
● Storage
– Shared kernel → efficient IO
– Small image → efficient deployment
Ecosystem
● Emerging container host OSs
– Atomic – http://projectatomic.io
●
os-tree (s/rpm/git/)
– CoreOS
● systemd + etcd + fleet
– Snappy Ubuntu
● New app provisioning model
– Small, single-service containers
– Standalone execution environment
● New open container spec nulecule
– https://github.com/projectatomic/nulecule
6
WHY NOT CONTAINERS?
Technology
● Security
– Shared kernel
– Limited isolation
● OS flexibility
– Shared kernel limits OS choices
● Inertia
Ecosystem
● New models don't capture many
legacy services
7
WHY CEPH?
● All components scale horizontally
● No single point of failure
● Hardware agnostic, commodity hardware
● Self-manage whenever possible
● Open source (LGPL)
● Move beyond legacy approaches
– client/cluster instead of client/server
– avoid ad hoc HA
8
CEPH COMPONENTS
RGW
A web services gateway
for object storage,
compatible with S3 and
Swift
LIBRADOS
A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOS
A software-based, reliable, autonomous, distributed object store comprised of
self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBD
A reliable, fully-distributed
block device with cloud
platform integration
CEPHFS
A distributed file system
with POSIX semantics and
scale-out metadata
management
APP HOST/VM CLIENT
BLOCK STORAGE
10
EXISTING BLOCK STORAGE MODEL
VM
●
VMs are the unit of cloud compute
●
Block devices are the unit of VM storage
– ephemeral: not redundant, discarded when VM dies
– persistent volumes: durable, (re)attached to any VM
●
Block devices are single-user
●
For shared storage,
– use objects (e.g., Swift or S3)
– use a database (e.g., Trove)
– ...
11
KVM + LIBRBD.SO
● Model
– Nova → libvirt → KVM → librbd.so
– Cinder → rbd.py → librbd.so
– Glance → rbd.py → librbd.so
● Pros
– proven
– decent performance
– good security
● Cons
– performance could be better
● Status
– most common deployment model
today (~44% in latest survey)
M M
RADOS CLUSTER
QEMU / KVM
LIBRBD
VM NOVA
CINDER
12
MULTIPLE CEPH DRIVERS
● librbd.so
– qemu-kvm
– rbd-fuse (experimental)
● rbd.ko (Linux kernel)
– /dev/rbd*
– stable and well-supported on modern kernels and distros
– some feature gap
● no client-side caching
● no “fancy striping”
– performance delta
● more efficient → more IOPS
● no client-side cache → higher latency for some workloads
13
LXC + CEPH.KO
● The model
– libvirt-based lxc containers
– map kernel RBD on host
– pass host device to libvirt, container
● Pros
– fast and efficient
– implement existing Nova API
● Cons
– weaker security than VM
● Status
– lxc is maintained
– lxc is less widely used
– no prototype
M M
RADOS CLUSTER
LINUX HOST
RBD.KO
CONTAINER
NOVA
14
NOVA-DOCKER + CEPH.KO
● The model
– docker container as mini-host
– map kernel RBD on host
– pass RBD device to container, or
– mount RBD, bind dir to container
● Pros
– buzzword-compliant
– fast and efficient
● Cons
– different image format
– different app model
– only a subset of docker feature set
● Status
– no prototype
– nova-docker is out of tree
https://wiki.openstack.org/wiki/Docker
15
IRONIC + CEPH.KO
● The model
– bare metal provisioning
– map kernel RBD directly from guest image
● Pros
– fast and efficient
– traditional app deployment model
● Cons
– guest OS must support rbd.ko
– requires agent
– boot-from-volume tricky
● Status
– Cinder and Ironic integration is a hot topic at
summit
● 5:20p Wednesday (cinder)
– no prototype
● References
– https://wiki.openstack.org/wiki/Ironic/blueprints/
cinder-integration
M M
RADOS CLUSTER
LINUX HOST
RBD.KO
16
BLOCK - SUMMARY
● But
– block storage is same old boring
– volumes are only semi-elastic (grow, not shrink; tedious to resize)
– storage is not shared between guests
performance efficiency VM
client
cache
striping
same
images?
exists
kvm + librbd.so best good X X X yes X
lxc + rbd.ko good best close
nova-docker + rbd.ko good best no
ironic + rbd.ko good best close? planned!
FILE STORAGE
18
MANILA FILE STORAGE
● Manila manages file volumes
– create/delete, share/unshare
– tenant network connectivity
– snapshot management
● Why file storage?
– familiar POSIX semantics
– fully shared volume – many clients can mount and share data
– elastic storage – amount of data can grow/shrink without explicit
provisioning
MANILA
19
MANILA CAVEATS
● Last mile problem
– must connect storage to guest network
– somewhat limited options (focus on Neutron)
● Mount problem
– Manila makes it possible for guest to mount
– guest is responsible for actual mount
– ongoing discussion around a guest agent …
● Current baked-in assumptions about both of these
MANILA
20
?
APPLIANCE DRIVERS
● Appliance drivers
– tell an appliance to export NFS to guests
– map appliance IP into tenant network
(Neutron)
– boring (closed, proprietary, expensive, etc.)
● Status
– several drivers from usual suspects
– security punted to vendor
NFS
MANILA
21
GANESHA DRIVER
● Model
– service VM running nfs-ganesha server
– mount file system on storage network
– export NFS to tenant network
– map IP into tenant network
● Status
– in-tree, well-supported
KVM
GANESHA
???
NFS
MANILA
???
22
KVM
GANESHA
KVM + GANESHA + LIBCEPHFS
● Model
– existing Ganesha driver, backed by
Ganesha's libcephfs FSAL
● Pros
– simple, existing model
– security
● Cons
– extra hop → higher latency
– service VM is SpoF
– service VM consumes resources
● Status
– Manila Ganesha driver exists
– untested with CephFS
M M
RADOS CLUSTER
LIBCEPHFS
KVM
NFS
NFS.KO
MANILA
NATIVE CEPH
23
KVM + CEPH.KO (CEPH-NATIVE)
● Model
– allow tenant access to storage network
– mount CephFS directly from tenant VM
● Pros
– best performance
– access to full CephFS feature set
– simple
● Cons
– guest must have modern distro/kernel
– exposes tenant to Ceph cluster
– must deliver mount secret to client
● Status
– no prototype
– CephFS isolation/security is work-in-progress
KVM
M M
RADOS CLUSTER
CEPH.KO
MANILA
NATIVE CEPH
24
NETWORK-ONLY MODEL IS LIMITING
● Current assumption of NFS or
CIFS sucks
● Always relying on guest mount
support sucks
– mount -t ceph -o what?
● Even assuming storage
connectivity is via the network
sucks
● There are other options!
– KVM virtfs/9p
● fs pass-through to host
● 9p protocol
● virtio for fast data transfer
● upstream; not widely used
– NFS re-export from host
● mount and export fs on host
● private host/guest net
● avoid network hop from NFS
service VM
– containers and 'mount --bind'
25
NOVA “ATTACH FS” API
● Mount problem is ongoing discussion by Manila team
– discussed this morning
– simple prototype using cloud-init
– Manila agent? leverage Zaqar tenant messaging service?
● A different proposal
– expand Nova to include “attach/detach file system” API
– analogous to current attach/detach volume for block
– each Nova driver may implement function differently
– “plumb” storage to tenant VM or container
● Open question
– Would API do the final “mount” step as well? (I say yes!)
26
KVM + VIRTFS/9P + CEPHFS.KO
● Model
– mount kernel CephFS on host
– pass-through to guest via virtfs/9p
● Pros
– security: tenant remains isolated from
storage net + locked inside a directory
● Cons
– require modern Linux guests
– 9p not supported on some distros
– “virtfs is ~50% slower than a native
mount?”
● Status
– Prototype from Haomai Wang
HOST
M M
RADOS CLUSTER
KVM VIRTFS
MANILA
NATIVE CEPH
CEPH.KO
VM
9P
NOVA
27
KVM + NFS + CEPHFS.KO
● Model
– mount kernel CephFS on host
– pass-through to guest via NFS
● Pros
– security: tenant remains isolated
from storage net + locked inside a
directory
– NFS is more standard
● Cons
– NFS has weak caching consistency
– NFS is slower
● Status
– no prototype
HOST
M M
RADOS CLUSTER
KVM
MANILA
NATIVE CEPH
CEPH.KO
VM
NFS
NOVA
28
(LXC, NOVA-DOCKER) + CEPHFS.KO
● Model
– host mounts CephFS directly
– mount --bind share into
container namespace
● Pros
– best performance
– full CephFS semantics
● Cons
– rely on container for security
● Status
– no prototype
HOST
M M
RADOS CLUSTER
CONTAINER
MANILA
NATIVE CEPH
CEPH.KO
NOVA
29
IRONIC + CEPHFS.KO
● Model
– mount CephFS directly from bare
metal “guest”
● Pros
– best performance
– full feature set
● Cons
– rely on CephFS security
– networking?
– agent to do the mount?
● Status
– no prototype
– no suitable (ironic) agent (yet)
HOST
M M
RADOS CLUSTER
MANILA
NATIVE CEPH
CEPH.KO
NOVA
30
THE MOUNT PROBLEM
● Containers may break the current 'network fs' assumption
– mounting becomes driver-dependent; harder for tenant to do the right thing
● Nova “attach fs” API could provide the needed entry point
– KVM: qemu-guest-agent
– Ironic: no guest agent yet...
– containers (lxc, nova-docker): use mount --bind from host
● Or, make tenant do the final mount?
– Manila API to provide command (template) to perform the mount
● e.g., “mount -t ceph $cephmonip:/manila/$uuid $PATH -o ...”
– Nova lxc and docker
● bind share to a “dummy” device /dev/manila/$uuid
● API mount command is 'mount --bind /dev/manila/$uuid $PATH'
31
SECURITY: NO FREE LUNCH
● (KVM, Ironic) + ceph.ko
– access to storage network relies on Ceph security
● KVM + (virtfs/9p, NFS) + ceph.ko
– better security, but
– pass-through/proxy limits performance
● (by how much?)
● Containers
– security (vs a VM) is weak at baseline, but
– host performs the mount; tenant locked into their share directory
32
PERFORMANCE
● 2 nodes
– Intel E5-2660
– 96GB RAM
– 10gb NIC
● Server
– 3 OSD (Intel S3500)
– 1 MON
– 1 MDS
● Client VMs
– 4 cores
– 2GB RAM
● iozone, 2x available RAM
● CephFS native
– VM ceph.ko → server
● CephFS 9p/virtfs
– VM 9p → host ceph.ko → server
● CephFS NFS
– VM NFS → server ceph.ko →
server
33
SEQUENTIAL
34
RANDOM
35
SUMMARY MATRIX
performance consistency VM gateway net hops security agent
mount
agent
prototype
kvm + ganesha +
libcephfs
slower (?) weak (nfs) X X 2 host X X
kvm + virtfs + ceph.ko good good X X 1 host X X
kvm + nfs + ceph.ko good weak (nfs) X X 1 host X
kvm + ceph.ko better best X 1 ceph X
lxc + ceph.ko best best 1 ceph
nova-docker + ceph.ko best best 1 ceph
IBM talk -
Thurs 9am
ironic + ceph.ko best best 1 ceph X X
CONTAINER ORCHESTRATION
37
CONTAINERS ARE DIFFERENT
● nova-docker implements a Nova view of a (Docker) container
– treats container like a standalone system
– does not leverage most of what Docker has to offer
– Nova == IaaS abstraction
● Kubernetes is the new hotness
– higher-level orchestration for containers
– draws on years of Google experience running containers at scale
– vibrant open source community
38
KUBERNETES SHARED STORAGE
● Pure Kubernetes – no OpenStack
● Volume drivers
– Local
● hostPath, emptyDir
– Unshared
● iSCSI, GCEPersistentDisk, Amazon EBS, Ceph RBD – local fs on top of existing device
– Shared
● NFS, GlusterFS, Amazon EFS, CephFS
● Status
– Ceph drivers under review
● Finalizing model for secret storage, cluster parameters (e.g., mon IPs)
– Drivers expect pre-existing volumes
● recycled; missing REST API to create/destroy volumes
39
KUBERNETES ON OPENSTACK
● Provision Nova VMs
– KVM or ironic
– Atomic or CoreOS
● Kubernetes per tenant
● Provision storage devices
– Cinder for volumes
– Manila for shares
● Kubernetes binds into pod/container
● Status
– Prototype Cinder plugin for Kubernetes
https://github.com/spothanis/kubernetes/tree/cinder-vol-plugin
KVM
Kube node
nginx pod
mysql pod
KVM
Kube node
nginx pod
mysql pod
KVM
Kube master
Volume
controller
...
CINDER MANILA
NOVA
40
WHAT NEXT?
● Ironic agent
– enable Cinder (and Manila?) on bare metal
– Cinder + Ironic
● 5:20p Wednesday (Cinder)
● Expand breadth of Manila drivers
– virtfs/9p, ceph-native, NFS proxy via host, etc.
– the last mile is not always the tenant network!
● Nova “attach fs” API (or equivalent)
– simplify tenant experience
– paper over VM vs container vs bare metal differences
THANK YOU!
Sage Weil
CEPH PRINCIPAL ARCHITECT
Haomai Wang
FREE AGENT
sage@redhat.com
haomaiwang@gmail.com
@liewegas
42
FOR MORE INFORMATION
● http://ceph.com
● http://github.com/ceph
● http://tracker.ceph.com
● Mailing lists
– ceph-users@ceph.com
– ceph-devel@vger.kernel.org
● irc.oftc.net
– #ceph
– #ceph-devel
● Twitter
– @ceph

More Related Content

PDF
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
PDF
What's new in Jewel and Beyond
PDF
The State of Ceph, Manila, and Containers in OpenStack
PDF
What's new in Luminous and Beyond
PDF
Community Update at OpenStack Summit Boston
PDF
Ceph and RocksDB
PDF
BlueStore: a new, faster storage backend for Ceph
PDF
Dude where's my volume, open stack summit vancouver 2015
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
What's new in Jewel and Beyond
The State of Ceph, Manila, and Containers in OpenStack
What's new in Luminous and Beyond
Community Update at OpenStack Summit Boston
Ceph and RocksDB
BlueStore: a new, faster storage backend for Ceph
Dude where's my volume, open stack summit vancouver 2015

What's hot (20)

ODP
Block Storage For VMs With Ceph
PDF
Making distributed storage easy: usability in Ceph Luminous and beyond
PDF
A crash course in CRUSH
PDF
Distributed Storage and Compute With Ceph's librados (Vault 2015)
PDF
BlueStore: a new, faster storage backend for Ceph
PDF
Ceph Performance: Projects Leading up to Jewel
PDF
Ceph data services in a multi- and hybrid cloud world
PPTX
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
PPTX
Bluestore
PPTX
Ceph Intro and Architectural Overview by Ross Turk
PDF
Ceph - A distributed storage system
PDF
BlueStore, A New Storage Backend for Ceph, One Year In
PDF
CephFS update February 2016
PDF
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
PPTX
Cephfs jewel mds performance benchmark
PPTX
ceph-barcelona-v-1.2
PPTX
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
PDF
XSKY - ceph luminous update
PPTX
What you need to know about ceph
PDF
Ceph and Mirantis OpenStack
Block Storage For VMs With Ceph
Making distributed storage easy: usability in Ceph Luminous and beyond
A crash course in CRUSH
Distributed Storage and Compute With Ceph's librados (Vault 2015)
BlueStore: a new, faster storage backend for Ceph
Ceph Performance: Projects Leading up to Jewel
Ceph data services in a multi- and hybrid cloud world
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
Bluestore
Ceph Intro and Architectural Overview by Ross Turk
Ceph - A distributed storage system
BlueStore, A New Storage Backend for Ceph, One Year In
CephFS update February 2016
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Cephfs jewel mds performance benchmark
ceph-barcelona-v-1.2
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
XSKY - ceph luminous update
What you need to know about ceph
Ceph and Mirantis OpenStack
Ad

Viewers also liked (18)

PDF
Storage tiering and erasure coding in Ceph (SCaLE13x)
PDF
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
PDF
Deploying and managing container-based applications with OpenStack and Kubern...
PPTX
Your 1st Ceph cluster
PDF
Red Hat Ceph Storage Roadmap: January 2016
PPTX
Ceph and OpenStack - Feb 2014
PDF
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
PDF
Ceph Block Devices: A Deep Dive
PPTX
Ceph Introduction 2017
PPTX
Orchestrating Docker Containers with Google Kubernetes on OpenStack
PDF
OpenStack Tokyo 2015: Connecting the Dots with Neutron
POTX
Jenkins, jclouds, CloudStack, and CentOS by David Nalley
PPTX
Policy Based SDN Solution for DC and Branch Office by Suresh Boddapati
PDF
L4-L7 services for SDN and NVF by Youcef Laribi
PDF
The Future of SDN in CloudStack by Chiradeep Vittal
PDF
Orchestrated Assurance
PDF
Open stack meetup 2014 11-13 - 101 + high availability
PPT
Deploying datacenters with Puppet - PuppetCamp Europe 2010
Storage tiering and erasure coding in Ceph (SCaLE13x)
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Deploying and managing container-based applications with OpenStack and Kubern...
Your 1st Ceph cluster
Red Hat Ceph Storage Roadmap: January 2016
Ceph and OpenStack - Feb 2014
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Ceph Block Devices: A Deep Dive
Ceph Introduction 2017
Orchestrating Docker Containers with Google Kubernetes on OpenStack
OpenStack Tokyo 2015: Connecting the Dots with Neutron
Jenkins, jclouds, CloudStack, and CentOS by David Nalley
Policy Based SDN Solution for DC and Branch Office by Suresh Boddapati
L4-L7 services for SDN and NVF by Youcef Laribi
The Future of SDN in CloudStack by Chiradeep Vittal
Orchestrated Assurance
Open stack meetup 2014 11-13 - 101 + high availability
Deploying datacenters with Puppet - PuppetCamp Europe 2010
Ad

Similar to Keeping OpenStack storage trendy with Ceph and containers (20)

PDF
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
PDF
Ceph as storage for CloudStack
PDF
[BarCamp2018][20180915][Tips for Virtual Hosting on Kubernetes]
PDF
2021.02 new in Ceph Pacific Dashboard
ODP
Performance characterization in large distributed file system with gluster fs
PPTX
Academy PRO: Docker. Part 1
PDF
What's New with Ceph - Ceph Day Silicon Valley
PDF
Docker and coreos20141020b
PDF
Integrating CloudStack & Ceph
PDF
Practical CephFS with nfs today using OpenStack Manila - Ceph Day Berlin - 12...
PDF
CEPH DAY BERLIN - PRACTICAL CEPHFS AND NFS USING OPENSTACK MANILA
PDF
[KubeCon NA 2020] containerd: Rootless Containers 2020
PDF
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
PDF
Ceph Day London 2014 - The current state of CephFS development
PDF
Ceph, the future of Storage - Sage Weil
PPTX
Open Source Investments in Mainframe Through the Next Generation - Showcasing...
PDF
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
PDF
OpenVZ Linux Containers
PDF
Ceph in the GRNET cloud stack
PDF
XenSummit - 08/28/2012
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
Ceph as storage for CloudStack
[BarCamp2018][20180915][Tips for Virtual Hosting on Kubernetes]
2021.02 new in Ceph Pacific Dashboard
Performance characterization in large distributed file system with gluster fs
Academy PRO: Docker. Part 1
What's New with Ceph - Ceph Day Silicon Valley
Docker and coreos20141020b
Integrating CloudStack & Ceph
Practical CephFS with nfs today using OpenStack Manila - Ceph Day Berlin - 12...
CEPH DAY BERLIN - PRACTICAL CEPHFS AND NFS USING OPENSTACK MANILA
[KubeCon NA 2020] containerd: Rootless Containers 2020
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
Ceph Day London 2014 - The current state of CephFS development
Ceph, the future of Storage - Sage Weil
Open Source Investments in Mainframe Through the Next Generation - Showcasing...
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
OpenVZ Linux Containers
Ceph in the GRNET cloud stack
XenSummit - 08/28/2012

Recently uploaded (20)

PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
L1 - Introduction to python Backend.pptx
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Understanding Forklifts - TECH EHS Solution
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
top salesforce developer skills in 2025.pdf
PDF
Nekopoi APK 2025 free lastest update
PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
ManageIQ - Sprint 268 Review - Slide Deck
How Creative Agencies Leverage Project Management Software.pdf
L1 - Introduction to python Backend.pptx
Wondershare Filmora 15 Crack With Activation Key [2025
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Upgrade and Innovation Strategies for SAP ERP Customers
Understanding Forklifts - TECH EHS Solution
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
top salesforce developer skills in 2025.pdf
Nekopoi APK 2025 free lastest update
Softaken Excel to vCard Converter Software.pdf
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PTS Company Brochure 2025 (1).pdf.......
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Design an Analysis of Algorithms I-SECS-1021-03
CHAPTER 2 - PM Management and IT Context
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Odoo POS Development Services by CandidRoot Solutions
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...

Keeping OpenStack storage trendy with Ceph and containers

  • 1. KEEPING OPENSTACK STORAGE TRENDY WITH CEPH AND CONTAINERS SAGE WEIL, HAOMAI WANG OPENSTACK SUMMIT - 2015.05.20
  • 2. 2 AGENDA ● Motivation ● Block ● File ● Container orchestration ● Summary
  • 4. 4 WEB APPLICATION APP SERVER APP SERVER APP SERVER APP SERVER A CLOUD SMORGASBORD ● Compelling clouds offer options ● Compute – VM (KVM, Xen, …) – Containers (lxc, Docker, OpenVZ, ...) ● Storage – Block (virtual disk) – File (shared) – Object (RESTful, …) – Key/value – NoSQL – SQL
  • 5. 5 WHY CONTAINERS? Technology ● Performance – Shared kernel – Faster boot – Lower baseline overhead – Better resource sharing ● Storage – Shared kernel → efficient IO – Small image → efficient deployment Ecosystem ● Emerging container host OSs – Atomic – http://projectatomic.io ● os-tree (s/rpm/git/) – CoreOS ● systemd + etcd + fleet – Snappy Ubuntu ● New app provisioning model – Small, single-service containers – Standalone execution environment ● New open container spec nulecule – https://github.com/projectatomic/nulecule
  • 6. 6 WHY NOT CONTAINERS? Technology ● Security – Shared kernel – Limited isolation ● OS flexibility – Shared kernel limits OS choices ● Inertia Ecosystem ● New models don't capture many legacy services
  • 7. 7 WHY CEPH? ● All components scale horizontally ● No single point of failure ● Hardware agnostic, commodity hardware ● Self-manage whenever possible ● Open source (LGPL) ● Move beyond legacy approaches – client/cluster instead of client/server – avoid ad hoc HA
  • 8. 8 CEPH COMPONENTS RGW A web services gateway for object storage, compatible with S3 and Swift LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors RBD A reliable, fully-distributed block device with cloud platform integration CEPHFS A distributed file system with POSIX semantics and scale-out metadata management APP HOST/VM CLIENT
  • 10. 10 EXISTING BLOCK STORAGE MODEL VM ● VMs are the unit of cloud compute ● Block devices are the unit of VM storage – ephemeral: not redundant, discarded when VM dies – persistent volumes: durable, (re)attached to any VM ● Block devices are single-user ● For shared storage, – use objects (e.g., Swift or S3) – use a database (e.g., Trove) – ...
  • 11. 11 KVM + LIBRBD.SO ● Model – Nova → libvirt → KVM → librbd.so – Cinder → rbd.py → librbd.so – Glance → rbd.py → librbd.so ● Pros – proven – decent performance – good security ● Cons – performance could be better ● Status – most common deployment model today (~44% in latest survey) M M RADOS CLUSTER QEMU / KVM LIBRBD VM NOVA CINDER
  • 12. 12 MULTIPLE CEPH DRIVERS ● librbd.so – qemu-kvm – rbd-fuse (experimental) ● rbd.ko (Linux kernel) – /dev/rbd* – stable and well-supported on modern kernels and distros – some feature gap ● no client-side caching ● no “fancy striping” – performance delta ● more efficient → more IOPS ● no client-side cache → higher latency for some workloads
  • 13. 13 LXC + CEPH.KO ● The model – libvirt-based lxc containers – map kernel RBD on host – pass host device to libvirt, container ● Pros – fast and efficient – implement existing Nova API ● Cons – weaker security than VM ● Status – lxc is maintained – lxc is less widely used – no prototype M M RADOS CLUSTER LINUX HOST RBD.KO CONTAINER NOVA
  • 14. 14 NOVA-DOCKER + CEPH.KO ● The model – docker container as mini-host – map kernel RBD on host – pass RBD device to container, or – mount RBD, bind dir to container ● Pros – buzzword-compliant – fast and efficient ● Cons – different image format – different app model – only a subset of docker feature set ● Status – no prototype – nova-docker is out of tree https://wiki.openstack.org/wiki/Docker
  • 15. 15 IRONIC + CEPH.KO ● The model – bare metal provisioning – map kernel RBD directly from guest image ● Pros – fast and efficient – traditional app deployment model ● Cons – guest OS must support rbd.ko – requires agent – boot-from-volume tricky ● Status – Cinder and Ironic integration is a hot topic at summit ● 5:20p Wednesday (cinder) – no prototype ● References – https://wiki.openstack.org/wiki/Ironic/blueprints/ cinder-integration M M RADOS CLUSTER LINUX HOST RBD.KO
  • 16. 16 BLOCK - SUMMARY ● But – block storage is same old boring – volumes are only semi-elastic (grow, not shrink; tedious to resize) – storage is not shared between guests performance efficiency VM client cache striping same images? exists kvm + librbd.so best good X X X yes X lxc + rbd.ko good best close nova-docker + rbd.ko good best no ironic + rbd.ko good best close? planned!
  • 18. 18 MANILA FILE STORAGE ● Manila manages file volumes – create/delete, share/unshare – tenant network connectivity – snapshot management ● Why file storage? – familiar POSIX semantics – fully shared volume – many clients can mount and share data – elastic storage – amount of data can grow/shrink without explicit provisioning MANILA
  • 19. 19 MANILA CAVEATS ● Last mile problem – must connect storage to guest network – somewhat limited options (focus on Neutron) ● Mount problem – Manila makes it possible for guest to mount – guest is responsible for actual mount – ongoing discussion around a guest agent … ● Current baked-in assumptions about both of these MANILA
  • 20. 20 ? APPLIANCE DRIVERS ● Appliance drivers – tell an appliance to export NFS to guests – map appliance IP into tenant network (Neutron) – boring (closed, proprietary, expensive, etc.) ● Status – several drivers from usual suspects – security punted to vendor NFS MANILA
  • 21. 21 GANESHA DRIVER ● Model – service VM running nfs-ganesha server – mount file system on storage network – export NFS to tenant network – map IP into tenant network ● Status – in-tree, well-supported KVM GANESHA ??? NFS MANILA ???
  • 22. 22 KVM GANESHA KVM + GANESHA + LIBCEPHFS ● Model – existing Ganesha driver, backed by Ganesha's libcephfs FSAL ● Pros – simple, existing model – security ● Cons – extra hop → higher latency – service VM is SpoF – service VM consumes resources ● Status – Manila Ganesha driver exists – untested with CephFS M M RADOS CLUSTER LIBCEPHFS KVM NFS NFS.KO MANILA NATIVE CEPH
  • 23. 23 KVM + CEPH.KO (CEPH-NATIVE) ● Model – allow tenant access to storage network – mount CephFS directly from tenant VM ● Pros – best performance – access to full CephFS feature set – simple ● Cons – guest must have modern distro/kernel – exposes tenant to Ceph cluster – must deliver mount secret to client ● Status – no prototype – CephFS isolation/security is work-in-progress KVM M M RADOS CLUSTER CEPH.KO MANILA NATIVE CEPH
  • 24. 24 NETWORK-ONLY MODEL IS LIMITING ● Current assumption of NFS or CIFS sucks ● Always relying on guest mount support sucks – mount -t ceph -o what? ● Even assuming storage connectivity is via the network sucks ● There are other options! – KVM virtfs/9p ● fs pass-through to host ● 9p protocol ● virtio for fast data transfer ● upstream; not widely used – NFS re-export from host ● mount and export fs on host ● private host/guest net ● avoid network hop from NFS service VM – containers and 'mount --bind'
  • 25. 25 NOVA “ATTACH FS” API ● Mount problem is ongoing discussion by Manila team – discussed this morning – simple prototype using cloud-init – Manila agent? leverage Zaqar tenant messaging service? ● A different proposal – expand Nova to include “attach/detach file system” API – analogous to current attach/detach volume for block – each Nova driver may implement function differently – “plumb” storage to tenant VM or container ● Open question – Would API do the final “mount” step as well? (I say yes!)
  • 26. 26 KVM + VIRTFS/9P + CEPHFS.KO ● Model – mount kernel CephFS on host – pass-through to guest via virtfs/9p ● Pros – security: tenant remains isolated from storage net + locked inside a directory ● Cons – require modern Linux guests – 9p not supported on some distros – “virtfs is ~50% slower than a native mount?” ● Status – Prototype from Haomai Wang HOST M M RADOS CLUSTER KVM VIRTFS MANILA NATIVE CEPH CEPH.KO VM 9P NOVA
  • 27. 27 KVM + NFS + CEPHFS.KO ● Model – mount kernel CephFS on host – pass-through to guest via NFS ● Pros – security: tenant remains isolated from storage net + locked inside a directory – NFS is more standard ● Cons – NFS has weak caching consistency – NFS is slower ● Status – no prototype HOST M M RADOS CLUSTER KVM MANILA NATIVE CEPH CEPH.KO VM NFS NOVA
  • 28. 28 (LXC, NOVA-DOCKER) + CEPHFS.KO ● Model – host mounts CephFS directly – mount --bind share into container namespace ● Pros – best performance – full CephFS semantics ● Cons – rely on container for security ● Status – no prototype HOST M M RADOS CLUSTER CONTAINER MANILA NATIVE CEPH CEPH.KO NOVA
  • 29. 29 IRONIC + CEPHFS.KO ● Model – mount CephFS directly from bare metal “guest” ● Pros – best performance – full feature set ● Cons – rely on CephFS security – networking? – agent to do the mount? ● Status – no prototype – no suitable (ironic) agent (yet) HOST M M RADOS CLUSTER MANILA NATIVE CEPH CEPH.KO NOVA
  • 30. 30 THE MOUNT PROBLEM ● Containers may break the current 'network fs' assumption – mounting becomes driver-dependent; harder for tenant to do the right thing ● Nova “attach fs” API could provide the needed entry point – KVM: qemu-guest-agent – Ironic: no guest agent yet... – containers (lxc, nova-docker): use mount --bind from host ● Or, make tenant do the final mount? – Manila API to provide command (template) to perform the mount ● e.g., “mount -t ceph $cephmonip:/manila/$uuid $PATH -o ...” – Nova lxc and docker ● bind share to a “dummy” device /dev/manila/$uuid ● API mount command is 'mount --bind /dev/manila/$uuid $PATH'
  • 31. 31 SECURITY: NO FREE LUNCH ● (KVM, Ironic) + ceph.ko – access to storage network relies on Ceph security ● KVM + (virtfs/9p, NFS) + ceph.ko – better security, but – pass-through/proxy limits performance ● (by how much?) ● Containers – security (vs a VM) is weak at baseline, but – host performs the mount; tenant locked into their share directory
  • 32. 32 PERFORMANCE ● 2 nodes – Intel E5-2660 – 96GB RAM – 10gb NIC ● Server – 3 OSD (Intel S3500) – 1 MON – 1 MDS ● Client VMs – 4 cores – 2GB RAM ● iozone, 2x available RAM ● CephFS native – VM ceph.ko → server ● CephFS 9p/virtfs – VM 9p → host ceph.ko → server ● CephFS NFS – VM NFS → server ceph.ko → server
  • 35. 35 SUMMARY MATRIX performance consistency VM gateway net hops security agent mount agent prototype kvm + ganesha + libcephfs slower (?) weak (nfs) X X 2 host X X kvm + virtfs + ceph.ko good good X X 1 host X X kvm + nfs + ceph.ko good weak (nfs) X X 1 host X kvm + ceph.ko better best X 1 ceph X lxc + ceph.ko best best 1 ceph nova-docker + ceph.ko best best 1 ceph IBM talk - Thurs 9am ironic + ceph.ko best best 1 ceph X X
  • 37. 37 CONTAINERS ARE DIFFERENT ● nova-docker implements a Nova view of a (Docker) container – treats container like a standalone system – does not leverage most of what Docker has to offer – Nova == IaaS abstraction ● Kubernetes is the new hotness – higher-level orchestration for containers – draws on years of Google experience running containers at scale – vibrant open source community
  • 38. 38 KUBERNETES SHARED STORAGE ● Pure Kubernetes – no OpenStack ● Volume drivers – Local ● hostPath, emptyDir – Unshared ● iSCSI, GCEPersistentDisk, Amazon EBS, Ceph RBD – local fs on top of existing device – Shared ● NFS, GlusterFS, Amazon EFS, CephFS ● Status – Ceph drivers under review ● Finalizing model for secret storage, cluster parameters (e.g., mon IPs) – Drivers expect pre-existing volumes ● recycled; missing REST API to create/destroy volumes
  • 39. 39 KUBERNETES ON OPENSTACK ● Provision Nova VMs – KVM or ironic – Atomic or CoreOS ● Kubernetes per tenant ● Provision storage devices – Cinder for volumes – Manila for shares ● Kubernetes binds into pod/container ● Status – Prototype Cinder plugin for Kubernetes https://github.com/spothanis/kubernetes/tree/cinder-vol-plugin KVM Kube node nginx pod mysql pod KVM Kube node nginx pod mysql pod KVM Kube master Volume controller ... CINDER MANILA NOVA
  • 40. 40 WHAT NEXT? ● Ironic agent – enable Cinder (and Manila?) on bare metal – Cinder + Ironic ● 5:20p Wednesday (Cinder) ● Expand breadth of Manila drivers – virtfs/9p, ceph-native, NFS proxy via host, etc. – the last mile is not always the tenant network! ● Nova “attach fs” API (or equivalent) – simplify tenant experience – paper over VM vs container vs bare metal differences
  • 41. THANK YOU! Sage Weil CEPH PRINCIPAL ARCHITECT Haomai Wang FREE AGENT sage@redhat.com haomaiwang@gmail.com @liewegas
  • 42. 42 FOR MORE INFORMATION ● http://ceph.com ● http://github.com/ceph ● http://tracker.ceph.com ● Mailing lists – ceph-users@ceph.com – ceph-devel@vger.kernel.org ● irc.oftc.net – #ceph – #ceph-devel ● Twitter – @ceph