Building reliable Ceph
clusters with SUSE
Enterprise Storage
Survival skills for the real world
Lars Marowsky-Brée
Distinguished Engineer
lmb@suse.com
What this talk is not
●
A comprehensive introduction to Ceph
●
SUSE Enterprise Storage roadmap session
●
A discussion of Ceph performance tuning
2
SUSE Enterprise Storage -
Reprise
3
The Ceph project
●
An Open Source Software-Defined-Storage project
●
Multiple front-ends
– S3/Swift object interface
– Native Linux block IO
– Heterogeneous Block IO (iSCSI)
– Native Linux network file system (CephFS)
– Heterogeneous Network File System (nfs-ganesha)
– Low-level, C++/Python/… libraries
– Linux, UNIX, Windows, Applications, Cloud, Containers
●
Common, smart data store (RADOS)
– Pseudo-random, algorithmic data distribution
4
Software-Defined-Storage
Ceph Cluster: Logical View
6
MON
MON
MON
MDS
MDS
OSDOSDOSD
OSD OSD OSD
iSCSI
Gateway
iSCSI
Gateway
iSCSI
Gateway
S3/Swift
Gateway
S3/Swift
Gateway
NFS
Gateway
RADOS
Introducing Dependability
7
Introducing dependability
●
Availability
●
Reliability
– Durability
●
Safety
●
Maintainability
8
The elephant in the room
●
Before we discuss technology ...
●
… guess what causes most outages?
9
Improve your human factor
●
Great, you are already here!
●
Training
●
Documentation
●
Team your team with a world-class
support and consulting organizations
10
High-level considerations
11
Advantages of Homogeneity
●
Eases system administration
●
Components are interchangeable
●
Lower purchasing costs
●
Standardized ordering process
12
Murphy’s Law, 2016 version
●
“At scale, everything fails.”
●
Distributed systems protect against
individual failures causing service failures by
eliminating Single Points of Failure
●
Distributed systems are still vulnerable to
correlated failures
13
2n+1
Advantages of Heterogeneity
Everything is broken …
… but everything is broken differently
14
Homogeneity is non-sustainable
●
Hardware gets replaced
– Replacement with same model not available, or
– not desirable given current prices
●
Software updates are not (yet) globally immediate
●
Requirements change
●
Your cluster ends up being heterogeneous anyway
●
… you might as well benefit from it.
15
Failure is inevitable; suffering is optional
●
If you want uptime, prepare for downtime
●
Architect your system to survive a single or
multiple failures
●
Test whether the system meets your SLA
– while degraded and during recovery!
16
How much availability do you need?
●
Availability and durability are not free
●
Cost, Complexity increase exponentially
●
Scale out makes some things easier
17
A bag of suggestions
18
Embrace diversity
●
Automatic recovery requires a >50% majority
– Splitting into multiple different categories/models
– Feasible for some components
– Multiple architectures?
– Mix them across different racks/pods
●
A 50:50 split still allows manual recovery in case of
catastrophic failures
– Different UPS and power circuits
19
Hardware choices
●
SUSE offers Reference Architectures:
– e.g., Lenovo, HPE, Cisco, Dell
●
Partners offer turn-key solutions
– e.g., HPE, Thomas-Krenn
●
SUSE Yes certification reduces risk
– https://www.suse.com/newsroom/post/2016/suse-extends-
partner-software-certification-for-cloud-and-storage-customers/
●
Small variations can have a huge impact!
20
Not all the eggs in one basket^Wrack
●
Distribute servers physically to limit the impact of power outages,
spills, …
●
Ceph’s CRUSH map allows you to describe the physical topology of
your fault domains (engineering speak for “availability zones”)
21
How many MONitors do I need?
22
2n+1
To converge roles or not
●
“Hyper converged” equals correlated
failures
●
It does drive down cost of implementation
●
Sizing becomes less deterministic
●
Services might recover at the same time
●
At scale, don’t correlate the MONs and
OSDs
23
Storage diversity
24
24
●
Avoid desktop
HDDs
●
Avoid sequential
serial numbers
●
Mount at different
angles if paranoid
●
Multiple vendors
●
Avoid desktop
SSDs
●
Monitor wear-
leveling
●
Remember the
journals see all
writes
Storage Node Sizing
●
Node failures most common granularity
– Admin mistake, network, kernel crash
●
Consider impact of outage on:
– Performance (degraded and recovery)
– and capacity!
●
A single node should not be more than 10% of your
total capacity
●
Free capacity should be larger than largest node
25
Data availability and durability
●
Replication:
– Number of copies
– Linear overhead
●
Erasure Coding:
– Flexible number of data and coding blocks
– Can survive any number of outages
– Fractional overhead
– https://www.youtube.com/watch?v=-KyGv6AZN9M
26
k+m
k
2n+1
Durability: Three-way Replication
27
Usable capacity: 33%
Durability: 2 faults
Durability: 4+3 Erasure Coding
28
Usable capacity: 57%
Durability: 3 faults
Consider Cache Tiering
●
Data in cache tier is replicated
●
Backing tier may be slower, but more
durable
29
Durability 201
●
Different strokes for different pools
●
Erasure coding schemes galore
30
Finding and correcting bad data
●
Ceph “scrubbing” detects inconsistent
or missing placement groups
periodically
http://ceph.com/planet/ceph-manually-
repair-object/
http://docs.ceph.com/docs/jewel/rados/configuration/osd-config-ref/#scrubbing
●
SUSE Enterprise Storage 5 will
validate checksums on every read
31
Automatic fault detection and recovery
●
Do you want this in your cluster?
●
Consider setting “noout”:
– during maintenance windows
– in small clusters
32
Network considerations
●
Have both the public and cluster network bonded
●
Consider different NICs
– Use last year’s NICs and switches
●
One channel from each network to each switch
33
Gateway considerations
●
RadosGW (S3/Swift):
– Use HTTP/TCP load balancers
– Possible to build using SLE HA with LVS or haproxy
●
iSCSI targets:
– Multiple gateways, natively supported by iSCSI
●
Improves availability and throughput
– Make sure you meet your performance SLAs during degraded
modes
34
Avoid configuration drift
●
Ensure that systems are configured consistently
– Installed packages
– Package versions
– Configuration (NTP, logging, passwords, …)
●
Avoid manual configuration
●
Use Salt instead
http://ourobengr.com/2016/11/hello-salty-goodness/
https://www.suse.com/communities/blog/managing-configuration-
drift-salt-snapper/
35
Trust but verify a.k.a. monitoring
●
Performance as the system ages
●
SSD degradation / wear leveling
●
Capacity utilization
●
“Free” capacity is usable for recovery
●
React to issues in a timely fashion!
36
Update, always (but with care)
●
Updates are good for your system
– Security
– Performance
– Stability
●
Ceph remains available even while updates are being rolled out
●
SUSE’s tested maintenance updates are the main product value
37
Trust nobody(not even SUSE)
●
If you at all possibly can, use a staging system
– Ideally: a (reduced) version of your production
environment
– At least: a virtualized environment
●
Test updates before rolling them out in production
– Not just code, but also processes!
●
Long-term maintainability:
– Avoid vendor lock-in, use Open Source
38
Disaster can will strike
●
Does it matter?
●
If it does:
– Backups
– Replicate to other sites
●
rbd-mirror, radosgw multi-site
●
Have fire drills!
39
Avoid complexity (KISS)
●
Be aggressive in what you test
– Test all the features
●
Be conservative in what you deploy
– Deploy only what you need
40
In conclusion
Don’t panic.
SUSE’s here to help.
41
Building reliable Ceph clusters with SUSE Enterprise Storage

More Related Content

PDF
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
PDF
Quick-and-Easy Deployment of a Ceph Storage Cluster
PDF
SUSE Storage: Sizing and Performance (Ceph)
PDF
2015 open storage workshop ceph software defined storage
ODP
Block Storage For VMs With Ceph
PPTX
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
PDF
TUT18972: Unleash the power of Ceph across the Data Center
PPTX
Designing for High Performance Ceph at Scale
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Quick-and-Easy Deployment of a Ceph Storage Cluster
SUSE Storage: Sizing and Performance (Ceph)
2015 open storage workshop ceph software defined storage
Block Storage For VMs With Ceph
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
TUT18972: Unleash the power of Ceph across the Data Center
Designing for High Performance Ceph at Scale

What's hot (20)

PPTX
New Ceph capabilities and Reference Architectures
PPTX
Ceph Intro and Architectural Overview by Ross Turk
PDF
librados
PDF
Red Hat Storage Server Administration Deep Dive
PDF
Red Hat Storage 2014 - Product(s) Overview
PPTX
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
PPTX
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
PDF
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
PDF
Ceph Day Taipei - Bring Ceph to Enterprise
PPTX
OpenStack and Ceph case study at the University of Alabama
PDF
Introduction into Ceph storage for OpenStack
PDF
Ceph Block Devices: A Deep Dive
PDF
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
PDF
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
PDF
Developing a Ceph Appliance for Secure Environments
PPTX
Ceph Tech Talk -- Ceph Benchmarking Tool
PPTX
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
PDF
Simplifying Ceph Management with Virtual Storage Manager (VSM)
PPTX
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
PPTX
Ceph and OpenStack - Feb 2014
New Ceph capabilities and Reference Architectures
Ceph Intro and Architectural Overview by Ross Turk
librados
Red Hat Storage Server Administration Deep Dive
Red Hat Storage 2014 - Product(s) Overview
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Ceph Day Taipei - Bring Ceph to Enterprise
OpenStack and Ceph case study at the University of Alabama
Introduction into Ceph storage for OpenStack
Ceph Block Devices: A Deep Dive
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Developing a Ceph Appliance for Secure Environments
Ceph Tech Talk -- Ceph Benchmarking Tool
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Simplifying Ceph Management with Virtual Storage Manager (VSM)
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Ceph and OpenStack - Feb 2014
Ad

Viewers also liked (20)

PDF
2012 Recent US Work Relating to Munitions in the Underwater Environment
PPTX
Ceph Performance and Sizing Guide
PDF
Hmc industry report_drone_technology_160321[1]
PPTX
Abhas hydrophone
PDF
How to Increase Hiring Efficiency with Social Recruiting
PPTX
Building a Scalable CI Platform using Docker, Drone and Rancher
PPTX
Use of drone for field observation - Presentation of applications in emergenc...
PPT
drones-an introduction to design
PDF
SubBottom Profiler training
PPTX
AVORA I successful participation in SAUC-E'12
PPTX
Underwater communication
PDF
Drone Patent Strategy
PDF
Using mapping drones for disaster prevention & response
PPTX
Drone Technology
PPTX
underwater acoustic propogation channels
PDF
Kongsberg Maritime AUVs
PPTX
PPT
Localization scheme for underwater wsn
PPTX
Drone-Unmanned Aerial Vehicle
PPTX
Advantages of a combined sonar data acquisition system for AUVs and ASVs
2012 Recent US Work Relating to Munitions in the Underwater Environment
Ceph Performance and Sizing Guide
Hmc industry report_drone_technology_160321[1]
Abhas hydrophone
How to Increase Hiring Efficiency with Social Recruiting
Building a Scalable CI Platform using Docker, Drone and Rancher
Use of drone for field observation - Presentation of applications in emergenc...
drones-an introduction to design
SubBottom Profiler training
AVORA I successful participation in SAUC-E'12
Underwater communication
Drone Patent Strategy
Using mapping drones for disaster prevention & response
Drone Technology
underwater acoustic propogation channels
Kongsberg Maritime AUVs
Localization scheme for underwater wsn
Drone-Unmanned Aerial Vehicle
Advantages of a combined sonar data acquisition system for AUVs and ASVs
Ad

Similar to Building reliable Ceph clusters with SUSE Enterprise Storage (20)

PDF
SUSE: Software Defined Storage
PDF
6_OPEN17_SUSE Enterprise Storage 4
PPTX
openSUSE storage workshop 2016
PDF
Best Practices with Ceph as Distributed, Intelligent, Unified Cloud Storage -...
PDF
8/ SUSE @ OPEN'16
ODP
Ceph Day SF 2015 - Keynote
PDF
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
PDF
SUSE Enterprise Storage - a Gentle Introduction
PDF
SUSE Enterprise Storage
PPTX
Ceph Day Berlin: Measuring and predicting performance of Ceph clusters
PDF
Ceph's journey at SUSE
PDF
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
PPTX
Ceph Day SF 2015 - Building your own disaster? The safe way to make Ceph ready!
PPTX
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
PDF
Reference Architecture: Architecting Ceph Storage Solutions
ODP
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
ODP
Ceph Day NYC: Building Tomorrow's Ceph
PDF
Ceph, the future of Storage - Sage Weil
PPTX
Dfs in iaa_s
PDF
Ceph as software define storage
SUSE: Software Defined Storage
6_OPEN17_SUSE Enterprise Storage 4
openSUSE storage workshop 2016
Best Practices with Ceph as Distributed, Intelligent, Unified Cloud Storage -...
8/ SUSE @ OPEN'16
Ceph Day SF 2015 - Keynote
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
SUSE Enterprise Storage - a Gentle Introduction
SUSE Enterprise Storage
Ceph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph's journey at SUSE
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Ceph Day SF 2015 - Building your own disaster? The safe way to make Ceph ready!
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Reference Architecture: Architecting Ceph Storage Solutions
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
Ceph Day NYC: Building Tomorrow's Ceph
Ceph, the future of Storage - Sage Weil
Dfs in iaa_s
Ceph as software define storage

Recently uploaded (20)

PDF
Types of Token_ From Utility to Security.pdf
PPTX
Computer Software - Technology and Livelihood Education
PPTX
Oracle Fusion HCM Cloud Demo for Beginners
PPTX
Patient Appointment Booking in Odoo with online payment
PDF
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
PDF
Topaz Photo AI Crack New Download (Latest 2025)
PPTX
Weekly report ppt - harsh dattuprasad patel.pptx
PPTX
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
PPTX
GSA Content Generator Crack (2025 Latest)
PDF
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PDF
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
PDF
Cost to Outsource Software Development in 2025
PPTX
Tech Workshop Escape Room Tech Workshop
DOCX
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PDF
Visual explanation of Dijkstra's Algorithm using Python
PPTX
"Secure File Sharing Solutions on AWS".pptx
PDF
Salesforce Agentforce AI Implementation.pdf
Types of Token_ From Utility to Security.pdf
Computer Software - Technology and Livelihood Education
Oracle Fusion HCM Cloud Demo for Beginners
Patient Appointment Booking in Odoo with online payment
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
Topaz Photo AI Crack New Download (Latest 2025)
Weekly report ppt - harsh dattuprasad patel.pptx
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
GSA Content Generator Crack (2025 Latest)
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
Cost to Outsource Software Development in 2025
Tech Workshop Escape Room Tech Workshop
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
Visual explanation of Dijkstra's Algorithm using Python
"Secure File Sharing Solutions on AWS".pptx
Salesforce Agentforce AI Implementation.pdf

Building reliable Ceph clusters with SUSE Enterprise Storage

  • 1. Building reliable Ceph clusters with SUSE Enterprise Storage Survival skills for the real world Lars Marowsky-Brée Distinguished Engineer lmb@suse.com
  • 2. What this talk is not ● A comprehensive introduction to Ceph ● SUSE Enterprise Storage roadmap session ● A discussion of Ceph performance tuning 2
  • 4. The Ceph project ● An Open Source Software-Defined-Storage project ● Multiple front-ends – S3/Swift object interface – Native Linux block IO – Heterogeneous Block IO (iSCSI) – Native Linux network file system (CephFS) – Heterogeneous Network File System (nfs-ganesha) – Low-level, C++/Python/… libraries – Linux, UNIX, Windows, Applications, Cloud, Containers ● Common, smart data store (RADOS) – Pseudo-random, algorithmic data distribution 4
  • 6. Ceph Cluster: Logical View 6 MON MON MON MDS MDS OSDOSDOSD OSD OSD OSD iSCSI Gateway iSCSI Gateway iSCSI Gateway S3/Swift Gateway S3/Swift Gateway NFS Gateway RADOS
  • 9. The elephant in the room ● Before we discuss technology ... ● … guess what causes most outages? 9
  • 10. Improve your human factor ● Great, you are already here! ● Training ● Documentation ● Team your team with a world-class support and consulting organizations 10
  • 12. Advantages of Homogeneity ● Eases system administration ● Components are interchangeable ● Lower purchasing costs ● Standardized ordering process 12
  • 13. Murphy’s Law, 2016 version ● “At scale, everything fails.” ● Distributed systems protect against individual failures causing service failures by eliminating Single Points of Failure ● Distributed systems are still vulnerable to correlated failures 13 2n+1
  • 14. Advantages of Heterogeneity Everything is broken … … but everything is broken differently 14
  • 15. Homogeneity is non-sustainable ● Hardware gets replaced – Replacement with same model not available, or – not desirable given current prices ● Software updates are not (yet) globally immediate ● Requirements change ● Your cluster ends up being heterogeneous anyway ● … you might as well benefit from it. 15
  • 16. Failure is inevitable; suffering is optional ● If you want uptime, prepare for downtime ● Architect your system to survive a single or multiple failures ● Test whether the system meets your SLA – while degraded and during recovery! 16
  • 17. How much availability do you need? ● Availability and durability are not free ● Cost, Complexity increase exponentially ● Scale out makes some things easier 17
  • 18. A bag of suggestions 18
  • 19. Embrace diversity ● Automatic recovery requires a >50% majority – Splitting into multiple different categories/models – Feasible for some components – Multiple architectures? – Mix them across different racks/pods ● A 50:50 split still allows manual recovery in case of catastrophic failures – Different UPS and power circuits 19
  • 20. Hardware choices ● SUSE offers Reference Architectures: – e.g., Lenovo, HPE, Cisco, Dell ● Partners offer turn-key solutions – e.g., HPE, Thomas-Krenn ● SUSE Yes certification reduces risk – https://www.suse.com/newsroom/post/2016/suse-extends- partner-software-certification-for-cloud-and-storage-customers/ ● Small variations can have a huge impact! 20
  • 21. Not all the eggs in one basket^Wrack ● Distribute servers physically to limit the impact of power outages, spills, … ● Ceph’s CRUSH map allows you to describe the physical topology of your fault domains (engineering speak for “availability zones”) 21
  • 22. How many MONitors do I need? 22 2n+1
  • 23. To converge roles or not ● “Hyper converged” equals correlated failures ● It does drive down cost of implementation ● Sizing becomes less deterministic ● Services might recover at the same time ● At scale, don’t correlate the MONs and OSDs 23
  • 24. Storage diversity 24 24 ● Avoid desktop HDDs ● Avoid sequential serial numbers ● Mount at different angles if paranoid ● Multiple vendors ● Avoid desktop SSDs ● Monitor wear- leveling ● Remember the journals see all writes
  • 25. Storage Node Sizing ● Node failures most common granularity – Admin mistake, network, kernel crash ● Consider impact of outage on: – Performance (degraded and recovery) – and capacity! ● A single node should not be more than 10% of your total capacity ● Free capacity should be larger than largest node 25
  • 26. Data availability and durability ● Replication: – Number of copies – Linear overhead ● Erasure Coding: – Flexible number of data and coding blocks – Can survive any number of outages – Fractional overhead – https://www.youtube.com/watch?v=-KyGv6AZN9M 26 k+m k 2n+1
  • 27. Durability: Three-way Replication 27 Usable capacity: 33% Durability: 2 faults
  • 28. Durability: 4+3 Erasure Coding 28 Usable capacity: 57% Durability: 3 faults
  • 29. Consider Cache Tiering ● Data in cache tier is replicated ● Backing tier may be slower, but more durable 29
  • 30. Durability 201 ● Different strokes for different pools ● Erasure coding schemes galore 30
  • 31. Finding and correcting bad data ● Ceph “scrubbing” detects inconsistent or missing placement groups periodically http://ceph.com/planet/ceph-manually- repair-object/ http://docs.ceph.com/docs/jewel/rados/configuration/osd-config-ref/#scrubbing ● SUSE Enterprise Storage 5 will validate checksums on every read 31
  • 32. Automatic fault detection and recovery ● Do you want this in your cluster? ● Consider setting “noout”: – during maintenance windows – in small clusters 32
  • 33. Network considerations ● Have both the public and cluster network bonded ● Consider different NICs – Use last year’s NICs and switches ● One channel from each network to each switch 33
  • 34. Gateway considerations ● RadosGW (S3/Swift): – Use HTTP/TCP load balancers – Possible to build using SLE HA with LVS or haproxy ● iSCSI targets: – Multiple gateways, natively supported by iSCSI ● Improves availability and throughput – Make sure you meet your performance SLAs during degraded modes 34
  • 35. Avoid configuration drift ● Ensure that systems are configured consistently – Installed packages – Package versions – Configuration (NTP, logging, passwords, …) ● Avoid manual configuration ● Use Salt instead http://ourobengr.com/2016/11/hello-salty-goodness/ https://www.suse.com/communities/blog/managing-configuration- drift-salt-snapper/ 35
  • 36. Trust but verify a.k.a. monitoring ● Performance as the system ages ● SSD degradation / wear leveling ● Capacity utilization ● “Free” capacity is usable for recovery ● React to issues in a timely fashion! 36
  • 37. Update, always (but with care) ● Updates are good for your system – Security – Performance – Stability ● Ceph remains available even while updates are being rolled out ● SUSE’s tested maintenance updates are the main product value 37
  • 38. Trust nobody(not even SUSE) ● If you at all possibly can, use a staging system – Ideally: a (reduced) version of your production environment – At least: a virtualized environment ● Test updates before rolling them out in production – Not just code, but also processes! ● Long-term maintainability: – Avoid vendor lock-in, use Open Source 38
  • 39. Disaster can will strike ● Does it matter? ● If it does: – Backups – Replicate to other sites ● rbd-mirror, radosgw multi-site ● Have fire drills! 39
  • 40. Avoid complexity (KISS) ● Be aggressive in what you test – Test all the features ● Be conservative in what you deploy – Deploy only what you need 40

Editor's Notes

  • #3: <number>
  • #5: <number>
  • #7: <number>
  • #9: <number>
  • #10: <number>
  • #11: <number>
  • #13: <number>
  • #14: <number>
  • #15: <number>
  • #16: <number>
  • #17: <number>
  • #18: <number>
  • #20: <number>
  • #21: <number>
  • #22: <number>
  • #23: <number>
  • #24: <number>
  • #25: <number>
  • #26: <number>
  • #27: <number>
  • #28: <number>
  • #29: <number>
  • #30: <number>
  • #31: <number>
  • #32: <number>
  • #33: <number>
  • #34: <number>
  • #35: <number>
  • #36: <number>
  • #37: <number>
  • #38: <number>
  • #39: <number>
  • #40: <number>
  • #41: <number>
  • #42: <number>