SlideShare a Scribd company logo
Ceph at Salesforce
Sameer Tiwari - Principal Architect, Storage Cloud
stiwari@salesforce.com
@techsameer
https://www.linkedin.com/in/sameer-tiwari-1961311/
3/17/2017 - Ceph Day at San Jose
Data Types
Structured Customer Data: Mostly transactional data on RDBMS
Unstructured Customer Data: Immutable blobs on home grown distributed storage system
SAN usage across multiple use cases
Backups: Both commercial solutions and internal systems
Caching : Immutable structured blobs
Events : On HDFS (plus other systems along the way)
Logs : On HDFS (plus other systems along the way)
Storage Technologies Used
File Storage
NOSQL
HBase
HDFS
SAN
SDS (Software
Designed Store)
on scale-out
commodity
hardware
Uses for Ceph
Block Store
Backend for RDBMs (Maybe with BK for journal)
Various size (to >> local disk) mountable disk on the cloud
Re-mountable storage for VMs
Replace some SAN scenarios
Blob Store
General purpose blob store
Sharing of data across users
Examples : VM/Container images, Core Dumps, Large file transfer, Customer Data, IoT
Ceph Rados
RadosGW
Private
Cloud
SF
Services
Hardware Storage SKU farm
10+ GigE Network
SF Blob
Service
Cloud Applications
RDBMS
SANs
Org specific
operations
SF Block
Service
Salesforce Infrastructure and Ceph
Current Status
Experimenting with multiple small test clusters (~100 nodes)
Machines are generally with lots of RAM, few SSDs and a bunch of HDDs
Currently on a single 10G Network, moving to much bigger
Machines are spread across lots of racks, but in a single room (very little over provisioning)
Testing only rbd
Simple crushmap mods for creating SSD only pools, and availability zones
Very high magnitude of scale: multiple clusters, across multiple DC, each multi-tenant
Operationalize for a very different and challenging requirement
Performance numbers (using fio to provide test load)
SSD only pool with 12 machines, 2X12 CPU, 128G, 2X480G SSD
Random R / W for 8K blocks, 70/30 ratio
Performance numbers (using fio to provide test load)
SSD only pool with 12 machines, 2X12 CPU, 128G, 2X480G SSD
Sequential Write for 128K blocks
Performance numbers (using fio to provide test load)
SSD only pool with 12 machines, 2X12 CPU, 128G, 2X480G SSD
Random Read for 8K blocks
Experiments
Pre-work: Hookup metrics, logs and alerts to Salesforce Infrastructure
Fio perf on mounted client side block device with XFS
Testing lots and lots of failure scenarios (think chaos monkey)
More focus on slow devices (network, host, disk)
Crushmap settings for heterogenous environments (will build a tool to generate this
automatically)
Set up a CI/CD pipeline
Running Ceph in a dockerized environment with Kubernetes
Ability to patch a deployed cluster (OS, Docker, Ceph)
Going over the code, line by line
Future
Read from any replica (inconsistent reads should help in tail latency)
Can reads search the journal (should help in tail latency)
Need pluggability in RGW, there is a pre_exec() in rgw_op.cc OR
Extend the RGWHandler class, or use the pre_exec() call in RGWOp class
Challenges of Storage Services at Salesforce
Scale brings problems all its own - more hardware to fail or act funny, regular cap add, hw
changes
Multiple dimensions of multi-tenancy
External Customers (isolation, auth/encryption, security, perf, availability, durability, etc.)
Service supporting many many use cases and internal platforms
Running large # of clusters in large # of data centers
Ceph at salesforce   ceph day external presentation
Questions?
Sameer Tiwari - Principal Architect, Storage Cloud
● stiwari@salesforce.com
● @techsameer
● https://www.linkedin.com/in/sameer-tiwari-1961311/

More Related Content

PDF
Ceph Object Storage at Spreadshirt
PDF
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
PPTX
Ceph Day Seoul - The Anatomy of Ceph I/O
PDF
Red Hat Gluster Storage
PDF
Life as a GlusterFS Consultant with Ivan Rossi
PPTX
New Ceph capabilities and Reference Architectures
PDF
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
PDF
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage at Spreadshirt
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
Ceph Day Seoul - The Anatomy of Ceph I/O
Red Hat Gluster Storage
Life as a GlusterFS Consultant with Ivan Rossi
New Ceph capabilities and Reference Architectures
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
Ceph Object Storage Reference Architecture Performance and Sizing Guide

What's hot (20)

PPTX
Ceph Day KL - Ceph Tiering with High Performance Archiecture
PPTX
Bluestore
PPT
An intro to Ceph and big data - CERN Big Data Workshop
PPTX
Backup / Restore to Cloud Storage with esXpress and CloudArray software
PDF
Containers and Databases
PPTX
Yaroslav Nedashkovsky - "Data Engineering in Information Security: how to col...
PDF
Using Ceph for Large Hadron Collider Data
PDF
Update on Crimson - the Seastarized Ceph - Seastar Summit
PPTX
QCT Ceph Solution - Design Consideration and Reference Architecture
PDF
SUSE Storage: Sizing and Performance (Ceph)
PDF
Ceph at Spreadshirt (June 2016)
ODP
Efficient data maintaince in GlusterFS using Databases
PDF
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)
PDF
OSDC 2013 | Scale-Out made easy: Petabyte storage with Ceph by Martin Gerhard...
PPTX
Ceph Day KL - Ceph on All-Flash Storage
PDF
Red Hat Storage Server Administration Deep Dive
PPTX
Hadoop over rgw
PPTX
VirtualStor Extreme - Software Defined Scale-Out All Flash Storage
PDF
Quick-and-Easy Deployment of a Ceph Storage Cluster
ODP
Ceph Day Santa Clara: Ceph Performance & Benchmarking
Ceph Day KL - Ceph Tiering with High Performance Archiecture
Bluestore
An intro to Ceph and big data - CERN Big Data Workshop
Backup / Restore to Cloud Storage with esXpress and CloudArray software
Containers and Databases
Yaroslav Nedashkovsky - "Data Engineering in Information Security: how to col...
Using Ceph for Large Hadron Collider Data
Update on Crimson - the Seastarized Ceph - Seastar Summit
QCT Ceph Solution - Design Consideration and Reference Architecture
SUSE Storage: Sizing and Performance (Ceph)
Ceph at Spreadshirt (June 2016)
Efficient data maintaince in GlusterFS using Databases
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)
OSDC 2013 | Scale-Out made easy: Petabyte storage with Ceph by Martin Gerhard...
Ceph Day KL - Ceph on All-Flash Storage
Red Hat Storage Server Administration Deep Dive
Hadoop over rgw
VirtualStor Extreme - Software Defined Scale-Out All Flash Storage
Quick-and-Easy Deployment of a Ceph Storage Cluster
Ceph Day Santa Clara: Ceph Performance & Benchmarking
Ad

Similar to Ceph at salesforce ceph day external presentation (20)

PPTX
Ceph Day San Jose - Ceph at Salesforce
PDF
Red Hat Storage 2014 - Product(s) Overview
PDF
Open ebs 101
PDF
OSDC 2015: John Spray | The Ceph Storage System
PDF
Ceph as software define storage
PPT
Distributed Filesystems Review
PPTX
Varrow datacenter storage today and tomorrow
PPT
PPT
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
PPT
Hadoop Architecture
PPT
Storage, San And Business Continuity Overview
ODP
Experience In Building Scalable Web Sites Through Infrastructure's View
DOCX
Hadoop Research
PDF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
PDF
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PPT
Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt
PPT
Storage Networks
PDF
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
PDF
Hbase: an introduction
PPTX
Ceph Day Bring Ceph To Enterprise
Ceph Day San Jose - Ceph at Salesforce
Red Hat Storage 2014 - Product(s) Overview
Open ebs 101
OSDC 2015: John Spray | The Ceph Storage System
Ceph as software define storage
Distributed Filesystems Review
Varrow datacenter storage today and tomorrow
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop Architecture
Storage, San And Business Continuity Overview
Experience In Building Scalable Web Sites Through Infrastructure's View
Hadoop Research
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt
Storage Networks
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Hbase: an introduction
Ceph Day Bring Ceph To Enterprise
Ad

Recently uploaded (20)

PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
KodekX | Application Modernization Development
PDF
Modernizing your data center with Dell and AMD
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Cloud computing and distributed systems.
PDF
Electronic commerce courselecture one. Pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Review of recent advances in non-invasive hemoglobin estimation
NewMind AI Monthly Chronicles - July 2025
Building Integrated photovoltaic BIPV_UPV.pdf
NewMind AI Weekly Chronicles - August'25 Week I
“AI and Expert System Decision Support & Business Intelligence Systems”
Mobile App Security Testing_ A Comprehensive Guide.pdf
A Presentation on Artificial Intelligence
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
CIFDAQ's Market Insight: SEC Turns Pro Crypto
KodekX | Application Modernization Development
Modernizing your data center with Dell and AMD
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Chapter 3 Spatial Domain Image Processing.pdf
Cloud computing and distributed systems.
Electronic commerce courselecture one. Pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?

Ceph at salesforce ceph day external presentation

  • 1. Ceph at Salesforce Sameer Tiwari - Principal Architect, Storage Cloud stiwari@salesforce.com @techsameer https://www.linkedin.com/in/sameer-tiwari-1961311/ 3/17/2017 - Ceph Day at San Jose
  • 2. Data Types Structured Customer Data: Mostly transactional data on RDBMS Unstructured Customer Data: Immutable blobs on home grown distributed storage system SAN usage across multiple use cases Backups: Both commercial solutions and internal systems Caching : Immutable structured blobs Events : On HDFS (plus other systems along the way) Logs : On HDFS (plus other systems along the way)
  • 3. Storage Technologies Used File Storage NOSQL HBase HDFS SAN SDS (Software Designed Store) on scale-out commodity hardware
  • 4. Uses for Ceph Block Store Backend for RDBMs (Maybe with BK for journal) Various size (to >> local disk) mountable disk on the cloud Re-mountable storage for VMs Replace some SAN scenarios Blob Store General purpose blob store Sharing of data across users Examples : VM/Container images, Core Dumps, Large file transfer, Customer Data, IoT
  • 5. Ceph Rados RadosGW Private Cloud SF Services Hardware Storage SKU farm 10+ GigE Network SF Blob Service Cloud Applications RDBMS SANs Org specific operations SF Block Service Salesforce Infrastructure and Ceph
  • 6. Current Status Experimenting with multiple small test clusters (~100 nodes) Machines are generally with lots of RAM, few SSDs and a bunch of HDDs Currently on a single 10G Network, moving to much bigger Machines are spread across lots of racks, but in a single room (very little over provisioning) Testing only rbd Simple crushmap mods for creating SSD only pools, and availability zones Very high magnitude of scale: multiple clusters, across multiple DC, each multi-tenant Operationalize for a very different and challenging requirement
  • 7. Performance numbers (using fio to provide test load) SSD only pool with 12 machines, 2X12 CPU, 128G, 2X480G SSD Random R / W for 8K blocks, 70/30 ratio
  • 8. Performance numbers (using fio to provide test load) SSD only pool with 12 machines, 2X12 CPU, 128G, 2X480G SSD Sequential Write for 128K blocks
  • 9. Performance numbers (using fio to provide test load) SSD only pool with 12 machines, 2X12 CPU, 128G, 2X480G SSD Random Read for 8K blocks
  • 10. Experiments Pre-work: Hookup metrics, logs and alerts to Salesforce Infrastructure Fio perf on mounted client side block device with XFS Testing lots and lots of failure scenarios (think chaos monkey) More focus on slow devices (network, host, disk) Crushmap settings for heterogenous environments (will build a tool to generate this automatically) Set up a CI/CD pipeline Running Ceph in a dockerized environment with Kubernetes Ability to patch a deployed cluster (OS, Docker, Ceph) Going over the code, line by line
  • 11. Future Read from any replica (inconsistent reads should help in tail latency) Can reads search the journal (should help in tail latency) Need pluggability in RGW, there is a pre_exec() in rgw_op.cc OR Extend the RGWHandler class, or use the pre_exec() call in RGWOp class
  • 12. Challenges of Storage Services at Salesforce Scale brings problems all its own - more hardware to fail or act funny, regular cap add, hw changes Multiple dimensions of multi-tenancy External Customers (isolation, auth/encryption, security, perf, availability, durability, etc.) Service supporting many many use cases and internal platforms Running large # of clusters in large # of data centers
  • 14. Questions? Sameer Tiwari - Principal Architect, Storage Cloud ● stiwari@salesforce.com ● @techsameer ● https://www.linkedin.com/in/sameer-tiwari-1961311/