SlideShare a Scribd company logo
Ceph performance
CephDays Frankfurt 2014
Whoami
💥 Sébastien Han
💥 French Cloud Engineer working for eNovance
💥 Daily job focused on Ceph and OpenStack
💥 Blogger
Personal blog: http://www.sebastien-han.fr/blog/
Company blog: http://techs.enovance.com/
Last Cephdays presentation
How does Ceph perform?
42*
*The Hitchhiker's Guide to the Galaxy
The Good
Ceph IO pattern
CRUSH: deterministic object
placement
As soon as a client writes into Ceph, the operation is computed
and the client decides to which OSD the object should belong
Aggregation: cluster level
As soon as you write into Ceph, all the objects get equally spread across the entire
Cluster, understanding machines and disks..
Aggregation: OSD level
As soon as an IO goes into an OSD, no matter how the original pattern was,
it becomes sequential.
The Bad
Ceph IO pattern
Journaling
As soon as an IO goes into an OSD, it gets written twice.
Journal and OSD data on the same
disk
Journal penalty on the disk
Since we write twice, if the journal is stored on the same disk as
the OSD data this will result in the following:
Device: wMB/s
sdb1 - journal 50.11
sdb2 - osd_data 40.25
Filesystem fragmentation
• Objects are stored as files on the OSD filesystem
• Several IO patterns with different block sizes increase filesystem
fragmentation
• Possible root cause: image sparseness
• One year old cluster ends up with (see allocsize options for
XFS):
$ sudo xfs_db -c frag -r /dev/sdd
actual 196334, ideal 122582, fragmentation factor 37.56%
No parallelized reads
• Ceph will always serve the read request from the primary OSD
• Room for Nx times speed up where N is the replica count
Blueprint from Sage for the Giant release
Scrubbing impact
• Consistent object check at the PG level
• Compare replicas versions between each others (Fsck for objects)
• Light scrubbing (daily) checks the object size and attributes.
• Deep scrubbing (weekly) reads the data and uses checksums to ensure
data integrity.
• Corruption exists – ECC memory (10^15 for enterprise disk)
~113TB
• No pain No gain
The Ugly
Ceph IO pattern
IOs to the OSD disk
One IO into Ceph leads to 2 writes, well… the second write is the worst!
The problem
• Several objects map to the same physical disks
• Sequential streams get mixed all together
• Result: The disk seeks like hell
Even worse with erasure coding?
This is just an assumption!
•Since erasure coding does chunks of chunks we can possibly
have this phenomena amplified
CLUSTER
How to build it?
How to start?
Things that you must consider:
•Use case
• IO profile: Bandwidth? IOPS? Mixed?
• How many IOPS or Bandwidth per client do I want to deliver?
• Do I use Ceph in standalone or is it combined with a software solution?
•Amount of data (usable not RAW)
• Replica count
• Do I have a data growth planning?
•Leftover
• How much data am I willing to lose if a node fails? (%)
• Am I ready to be annoyed by the scrubbing process?
•
Things that you must not do
• Don't put a RAID underneath your OSD
• Ceph already manages the replication
• Degraded RAID breaks performances
• Reduce usable space on the cluster
• Don't build high density nodes with a tiny cluster
• Failure consideration and data to re-balance
• Potential full cluster
• Don't run Ceph on your hypervisors (unless you're broke)
• Well maybe…
Firefly: Interesting things
going on
Object store multi-backend
• ObjectStore is born
• Aims to support several backends:
• levelDB (default)
• RocksDB
• Fusionio NVMKV
• Seagate Kinetic
• Yours!
Why is it so good?
• No more journal! Yay!
• Object backends have built-in atomic functions
Firefly leveldb
• Relatively new
• Need to be tested with your workload first
• Tend to be more efficient with small objects
Many thanks!
Questions?
Contact: sebastien@enovance.com
Twitter: @sebastien_han
IRC: leseb

More Related Content

PPT
Openstack Summit HK - Ceph defacto - eNovance
PDF
OpenStack in Action 4! Sebastien Han - Ceph: de facto storage backend for Ope...
PDF
High Availability from the DevOps side - OpenStack Summit Portland
PPT
Ceph de facto storage backend for OpenStack
PDF
Protecting the Galaxy - Multi-Region Disaster Recovery with OpenStack and Ceph
PDF
Peanut Butter and jelly: Mapping the deep Integration between Ceph and OpenStack
PDF
When disaster strikes the cloud: Who, what, when, where and how to recover
PDF
Ceph & OpenStack talk given @ OpenStack Meetup @ Bangalore, June 2015
Openstack Summit HK - Ceph defacto - eNovance
OpenStack in Action 4! Sebastien Han - Ceph: de facto storage backend for Ope...
High Availability from the DevOps side - OpenStack Summit Portland
Ceph de facto storage backend for OpenStack
Protecting the Galaxy - Multi-Region Disaster Recovery with OpenStack and Ceph
Peanut Butter and jelly: Mapping the deep Integration between Ceph and OpenStack
When disaster strikes the cloud: Who, what, when, where and how to recover
Ceph & OpenStack talk given @ OpenStack Meetup @ Bangalore, June 2015

What's hot (20)

PDF
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
PDF
Disaggregating Ceph using NVMeoF
PDF
Multiple Sites and Disaster Recovery with Ceph: Andrew Hatfield, Red Hat
PDF
How to Survive an OpenStack Cloud Meltdown with Ceph
PDF
OpenStack in Action 4! Vincent Untz - Running multiple hypervisors in your Op...
PDF
Manila, an update from Liberty, OpenStack Summit - Tokyo
PDF
Enabling Disaster Recovery as Service (DRaaS) on OpenStack
PDF
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
PDF
Re-Think of Virtualization and Containerization
PDF
Open stack in action enovance-quantum in action
PPTX
Containers and HPC
PDF
DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...
PDF
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
PDF
Antoine Coetsier - billing the cloud
PPTX
Stateful set in kubernetes implementation & usecases
PDF
Red Hat Summit 2017: Wicked Fast PaaS: Performance Tuning of OpenShift and D...
PDF
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
PPTX
Which Hypervisor is Best?
PDF
XCP-ng - past, present and future
PDF
Ceph Tech Talk: Ceph at DigitalOcean
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
Disaggregating Ceph using NVMeoF
Multiple Sites and Disaster Recovery with Ceph: Andrew Hatfield, Red Hat
How to Survive an OpenStack Cloud Meltdown with Ceph
OpenStack in Action 4! Vincent Untz - Running multiple hypervisors in your Op...
Manila, an update from Liberty, OpenStack Summit - Tokyo
Enabling Disaster Recovery as Service (DRaaS) on OpenStack
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
Re-Think of Virtualization and Containerization
Open stack in action enovance-quantum in action
Containers and HPC
DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
Antoine Coetsier - billing the cloud
Stateful set in kubernetes implementation & usecases
Red Hat Summit 2017: Wicked Fast PaaS: Performance Tuning of OpenShift and D...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
Which Hypervisor is Best?
XCP-ng - past, present and future
Ceph Tech Talk: Ceph at DigitalOcean
Ad

Viewers also liked (10)

PPTX
Ceph Day Seoul - Ceph on Arm Scaleable and Efficient
PDF
End of RAID as we know it with Ceph Replication
PPTX
Ceph Day Taipei - Accelerate Ceph via SPDK
PDF
What's new in Jewel and Beyond
PDF
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
PDF
Performance comparison of Distributed File Systems on 1Gbit networks
PDF
BlueStore, A New Storage Backend for Ceph, One Year In
PDF
BlueStore: a new, faster storage backend for Ceph
PDF
A crash course in CRUSH
PDF
BlueStore: a new, faster storage backend for Ceph
Ceph Day Seoul - Ceph on Arm Scaleable and Efficient
End of RAID as we know it with Ceph Replication
Ceph Day Taipei - Accelerate Ceph via SPDK
What's new in Jewel and Beyond
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
Performance comparison of Distributed File Systems on 1Gbit networks
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore: a new, faster storage backend for Ceph
A crash course in CRUSH
BlueStore: a new, faster storage backend for Ceph
Ad

Similar to Ceph Performance and Optimization - Ceph Day Frankfurt (20)

PPTX
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
PDF
Ceph in the GRNET cloud stack
PDF
Webinar - Getting Started With Ceph
PDF
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
PPTX
In-Ceph-tion: Deploying a Ceph cluster on DreamCompute
PPTX
How swift is your Swift - SD.pptx
PDF
Surge2012
PDF
Unite2013-gavilan-pdf
PDF
Open Source Storage at Scale: Ceph @ GRNET
PPTX
Ceph, Xen, and CloudStack: Semper Melior
PDF
Erasure Code at Scale - Thomas William Byrne
PDF
PhegData X - High Performance EBS
PDF
Introduction to Cassandra and CQL for Java developers
PDF
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
PDF
Ceph and openstack at the boston meetup
PPTX
Ceph & OpenStack - Boston Meetup
PDF
Debugging ZFS: From Illumos to Linux
PDF
OpenStack and Ceph: the Winning Pair
ODP
Ceph Day NYC: Building Tomorrow's Ceph
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph in the GRNET cloud stack
Webinar - Getting Started With Ceph
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
In-Ceph-tion: Deploying a Ceph cluster on DreamCompute
How swift is your Swift - SD.pptx
Surge2012
Unite2013-gavilan-pdf
Open Source Storage at Scale: Ceph @ GRNET
Ceph, Xen, and CloudStack: Semper Melior
Erasure Code at Scale - Thomas William Byrne
PhegData X - High Performance EBS
Introduction to Cassandra and CQL for Java developers
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
Ceph and openstack at the boston meetup
Ceph & OpenStack - Boston Meetup
Debugging ZFS: From Illumos to Linux
OpenStack and Ceph: the Winning Pair
Ceph Day NYC: Building Tomorrow's Ceph

Recently uploaded (20)

PDF
Machine learning based COVID-19 study performance prediction
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Encapsulation theory and applications.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Approach and Philosophy of On baking technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Empathic Computing: Creating Shared Understanding
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Electronic commerce courselecture one. Pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Machine learning based COVID-19 study performance prediction
Network Security Unit 5.pdf for BCA BBA.
Encapsulation theory and applications.pdf
MYSQL Presentation for SQL database connectivity
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Understanding_Digital_Forensics_Presentation.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Approach and Philosophy of On baking technology
Digital-Transformation-Roadmap-for-Companies.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Dropbox Q2 2025 Financial Results & Investor Presentation
Empathic Computing: Creating Shared Understanding
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Review of recent advances in non-invasive hemoglobin estimation
Building Integrated photovoltaic BIPV_UPV.pdf
The AUB Centre for AI in Media Proposal.docx
Electronic commerce courselecture one. Pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf

Ceph Performance and Optimization - Ceph Day Frankfurt

  • 2. Whoami 💥 Sébastien Han 💥 French Cloud Engineer working for eNovance 💥 Daily job focused on Ceph and OpenStack 💥 Blogger Personal blog: http://www.sebastien-han.fr/blog/ Company blog: http://techs.enovance.com/ Last Cephdays presentation
  • 3. How does Ceph perform? 42* *The Hitchhiker's Guide to the Galaxy
  • 5. CRUSH: deterministic object placement As soon as a client writes into Ceph, the operation is computed and the client decides to which OSD the object should belong
  • 6. Aggregation: cluster level As soon as you write into Ceph, all the objects get equally spread across the entire Cluster, understanding machines and disks..
  • 7. Aggregation: OSD level As soon as an IO goes into an OSD, no matter how the original pattern was, it becomes sequential.
  • 8. The Bad Ceph IO pattern
  • 9. Journaling As soon as an IO goes into an OSD, it gets written twice.
  • 10. Journal and OSD data on the same disk Journal penalty on the disk Since we write twice, if the journal is stored on the same disk as the OSD data this will result in the following: Device: wMB/s sdb1 - journal 50.11 sdb2 - osd_data 40.25
  • 11. Filesystem fragmentation • Objects are stored as files on the OSD filesystem • Several IO patterns with different block sizes increase filesystem fragmentation • Possible root cause: image sparseness • One year old cluster ends up with (see allocsize options for XFS): $ sudo xfs_db -c frag -r /dev/sdd actual 196334, ideal 122582, fragmentation factor 37.56%
  • 12. No parallelized reads • Ceph will always serve the read request from the primary OSD • Room for Nx times speed up where N is the replica count Blueprint from Sage for the Giant release
  • 13. Scrubbing impact • Consistent object check at the PG level • Compare replicas versions between each others (Fsck for objects) • Light scrubbing (daily) checks the object size and attributes. • Deep scrubbing (weekly) reads the data and uses checksums to ensure data integrity. • Corruption exists – ECC memory (10^15 for enterprise disk) ~113TB • No pain No gain
  • 14. The Ugly Ceph IO pattern
  • 15. IOs to the OSD disk One IO into Ceph leads to 2 writes, well… the second write is the worst!
  • 16. The problem • Several objects map to the same physical disks • Sequential streams get mixed all together • Result: The disk seeks like hell
  • 17. Even worse with erasure coding? This is just an assumption! •Since erasure coding does chunks of chunks we can possibly have this phenomena amplified
  • 19. How to start? Things that you must consider: •Use case • IO profile: Bandwidth? IOPS? Mixed? • How many IOPS or Bandwidth per client do I want to deliver? • Do I use Ceph in standalone or is it combined with a software solution? •Amount of data (usable not RAW) • Replica count • Do I have a data growth planning? •Leftover • How much data am I willing to lose if a node fails? (%) • Am I ready to be annoyed by the scrubbing process? •
  • 20. Things that you must not do • Don't put a RAID underneath your OSD • Ceph already manages the replication • Degraded RAID breaks performances • Reduce usable space on the cluster • Don't build high density nodes with a tiny cluster • Failure consideration and data to re-balance • Potential full cluster • Don't run Ceph on your hypervisors (unless you're broke) • Well maybe…
  • 22. Object store multi-backend • ObjectStore is born • Aims to support several backends: • levelDB (default) • RocksDB • Fusionio NVMKV • Seagate Kinetic • Yours!
  • 23. Why is it so good? • No more journal! Yay! • Object backends have built-in atomic functions
  • 24. Firefly leveldb • Relatively new • Need to be tested with your workload first • Tend to be more efficient with small objects