SlideShare a Scribd company logo
Building World Class Data Centers
Mellanox High Performance Networks for Ceph
Ceph Day, June 10th, 2014
© 2014 Mellanox Technologies 2
Leading Supplier of End-to-End Interconnect Solutions
Virtual Protocol Interconnect
Storage
Front / Back-End
Server / Compute Switch / Gateway
56G IB & FCoIB 56G InfiniBand
10/40/56GbE & FCoE 10/40/56GbE
Virtual Protocol Interconnect
Host/Fabric SoftwareICs Switches/GatewaysAdapter Cards Cables/Modules
Comprehensive End-to-End InfiniBand and Ethernet Portfolio
Metro / WAN
© 2014 Mellanox Technologies 3
The Future Depends on Fastest Interconnects
10Gb/s 40/56Gb/s1Gb/s
© 2014 Mellanox Technologies 4
From Scale-Up to Scale-Out Architecture
 Only way to support storage capacity growth in a cost-effective manner
 We have seen this transition on the compute side in HPC in the early 2000s
 Scaling performance linearly requires “seamless connectivity” (ie lossless, high bw, low latency,
cpu offloads)
Interconnect Capabilities Determine Scale Out Performance
© 2014 Mellanox Technologies 5
CEPH and Networks
 High performance networks enable maximum cluster availability
• Clients, OSD, Monitors and Metadata servers communicate over multiple network layers
• Real-time requirements for heartbeat, replication, recovery and re-balancing
 Cluster (“backend”) network performance dictates cluster’s performance and scalability
• “Network load between Ceph OSD Daemons easily dwarfs the network load between Ceph Clients
and the Ceph Storage Cluster” (Ceph Documentation)
© 2014 Mellanox Technologies 6
How Customers Deploy CEPH with Mellanox Interconnect
 Building Scalable, Performing Storage Solutions
• Cluster network @ 40Gb Ethernet
• Clients @ 10G/40Gb Ethernet
 Directly connect over 500 Client Nodes
• Target Retail Cost: US$350/1TB
 Scale Out Customers Use SSDs
• For OSDs and Journals
8.5PB System Currently Being Deployed
© 2014 Mellanox Technologies 7
CEPH Deployment Using 10GbE and 40GbE
 Cluster (Private) Network @ 40GbE
• Smooth HA, unblocked heartbeats, efficient data balancing
 Throughput Clients @ 40GbE
• Guaranties line rate for high ingress/egress clients
 IOPs Clients @ 10GbE / 40GbE
• 100K+ IOPs/Client @4K blocks
20x Higher Throughput , 4x Higher IOPs with 40Gb Ethernet Clients!
(http://www.mellanox.com/related-docs/whitepapers/WP_Deploying_Ceph_over_High_Performance_Networks.pdf)
Throughput Testing results based on fio benchmark, 8m block, 20GB file,128 parallel jobs, RBD Kernel Driver with Linux Kernel 3.13.3 RHEL 6.3, Ceph 0.72.2
IOPs Testing results based on fio benchmark, 4k block, 20GB file,128 parallel jobs, RBD Kernel Driver with Linux Kernel 3.13.3 RHEL 6.3, Ceph 0.72.2
Cluster Network
Admin Node
40GbE
Public Network
10GbE/40GBE
Ceph Nodes
(Monitors, OSDs, MDS)
Client Nodes
10GbE/40GbE
© 2014 Mellanox Technologies 8
CEPH and Hadoop Co-Exist
 Increase Hadoop Cluster Performance
 Scale Compute and Storage solutions in Efficient Ways
 Mitigate Single Point of Failure Events in Hadoop Architecture
Name Node /Job
Tracker Data Node
Ceph NodeCeph Node
Data Node Data Node
Ceph Node
Admin Node
© 2014 Mellanox Technologies 9
I/O Offload Frees Up CPU for Application Processing
~88% CPU
Efficiency
UserSpaceSystemSpace
~53% CPU
Efficiency
~47% CPU
Overhead/Idle
~12% CPU
Overhead/Idle
Without RDMA With RDMA and Offload
UserSpaceSystemSpace
© 2014 Mellanox Technologies 10
 Open source!
• https://github.com/accelio/accelio/ && www.accelio.org
 Faster RDMA integration to application
 Asynchronous
 Maximize msg and CPU parallelism
 Enable > 10GB/s from single node
 Enable < 10usec latency under load
 In Next Generation Blueprint (Giant)
• http://wiki.ceph.com/Planning/Blueprints/Giant/Accelio_RDMA_Messenger
Accelio, High-Performance Reliable Messaging and RPC Library
© 2014 Mellanox Technologies 11
Summary
 CEPH cluster scalability and availability rely on high performance networks
 End to end 40/56 Gb/s transport with full CPU offloads available and being deployed
• 100Gb/s around the corner
 Stay tuned for the afternoon session by CohortFS on RDMA for CEPH
Thank You

More Related Content

PPTX
Big Data Benchmarking with RDMA solutions
PPTX
Ceph on rdma
PDF
Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...
PPTX
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
PPTX
Mellanox Approach to NFV & SDN
PDF
I/O virtualization with InfiniBand and 40 Gigabit Ethernet
PPTX
Erez Cohen & Aviram Bar Haim, Mellanox - Enhancing Your OpenStack Cloud With ...
PDF
A Fresh Look at HPC from Huawei Enterprise
Big Data Benchmarking with RDMA solutions
Ceph on rdma
Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Mellanox Approach to NFV & SDN
I/O virtualization with InfiniBand and 40 Gigabit Ethernet
Erez Cohen & Aviram Bar Haim, Mellanox - Enhancing Your OpenStack Cloud With ...
A Fresh Look at HPC from Huawei Enterprise

What's hot (20)

PDF
Deploying flash storage for Ceph without compromising performance
PDF
DPDK Summit 2015 - Sprint - Arun Rajagopal
PDF
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator S...
PDF
High Performance Interconnects: Landscape, Assessments & Rankings
PDF
DPDK Summit 2015 - Aspera - Charles Shiflett
PPTX
OVS v OVS-DPDK
PDF
Approaching hyperconvergedopenstack
PDF
NVMe Takes It All, SCSI Has To Fall
PDF
Ambedded - how to build a true no single point of failure ceph cluster
PPTX
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PPTX
High-performance 32G Fibre Channel Module on MDS 9700 Directors:
PDF
Accelerating Ceph with RDMA and NVMe-oF
PDF
Announcing the Mellanox ConnectX-5 100G InfiniBand Adapter
PPTX
Ceph Performance Profiling and Reporting
PPTX
Cisco data center training for ibm
PPTX
Ceph Day Seoul - Ceph on Arm Scaleable and Efficient
PDF
GEN-Z: An Overview and Use Cases
PDF
Accelerate Service Function Chaining Vertical Solution with DPDK
PPTX
Eliminating SAN Congestion Just Got Much Easier- webinar - Nov 2015
PDF
Moving to PCI Express based SSD with NVM Express
Deploying flash storage for Ceph without compromising performance
DPDK Summit 2015 - Sprint - Arun Rajagopal
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator S...
High Performance Interconnects: Landscape, Assessments & Rankings
DPDK Summit 2015 - Aspera - Charles Shiflett
OVS v OVS-DPDK
Approaching hyperconvergedopenstack
NVMe Takes It All, SCSI Has To Fall
Ambedded - how to build a true no single point of failure ceph cluster
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
High-performance 32G Fibre Channel Module on MDS 9700 Directors:
Accelerating Ceph with RDMA and NVMe-oF
Announcing the Mellanox ConnectX-5 100G InfiniBand Adapter
Ceph Performance Profiling and Reporting
Cisco data center training for ibm
Ceph Day Seoul - Ceph on Arm Scaleable and Efficient
GEN-Z: An Overview and Use Cases
Accelerate Service Function Chaining Vertical Solution with DPDK
Eliminating SAN Congestion Just Got Much Easier- webinar - Nov 2015
Moving to PCI Express based SSD with NVM Express
Ad

Similar to Mellanox High Performance Networks for Ceph (20)

PPTX
Ceph Day New York 2014: Ceph over High Performance Networks
PPTX
Ceph Day London 2014 - Ceph Over High-Performance Networks
PPTX
Ceph Day Amsterdam 2015 - Deploying flash storage for Ceph without compromisi...
PPTX
Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...
PPTX
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...
PDF
International Journal of Engineering Research and Development
PDF
InfiniBand In-Network Computing Technology and Roadmap
PDF
IBM System Networking RackSwitch G8264CS
PDF
Ecommerce Hosting Provider Drastically Cuts Server ...
PDF
Family data sheet HP Virtual Connect(May 2013)
PDF
Application Report: Big Data - Big Cluster Interconnects
PDF
Application Report: Migrating from Discrete to Virtual Servers
PDF
Introduction to NVMe Over Fabrics-V3R
PDF
DPDK Summit 2015 - HP - Al Sanders
PDF
Brocade solution brief
PDF
IBM 40Gb Ethernet - A competitive alternative to Infiniband
PPTX
Scale Out Database Solution
PDF
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
PDF
Accelerate Big Data Processing with High-Performance Computing Technologies
PDF
Co-Design Architecture for Exascale
Ceph Day New York 2014: Ceph over High Performance Networks
Ceph Day London 2014 - Ceph Over High-Performance Networks
Ceph Day Amsterdam 2015 - Deploying flash storage for Ceph without compromisi...
Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...
International Journal of Engineering Research and Development
InfiniBand In-Network Computing Technology and Roadmap
IBM System Networking RackSwitch G8264CS
Ecommerce Hosting Provider Drastically Cuts Server ...
Family data sheet HP Virtual Connect(May 2013)
Application Report: Big Data - Big Cluster Interconnects
Application Report: Migrating from Discrete to Virtual Servers
Introduction to NVMe Over Fabrics-V3R
DPDK Summit 2015 - HP - Al Sanders
Brocade solution brief
IBM 40Gb Ethernet - A competitive alternative to Infiniband
Scale Out Database Solution
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
Accelerate Big Data Processing with High-Performance Computing Technologies
Co-Design Architecture for Exascale
Ad

More from Mellanox Technologies (20)

PPTX
InfiniBand Growth Trends - TOP500 (July 2015)
PDF
Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification
PDF
InfiniBand FAQ
PPTX
InfiniBand Strengthens Leadership as the Interconnect Of Choice
PPTX
CloudX – Expand Your Cloud into the Future
PPTX
Mellanox VXLAN Acceleration
PPTX
Virtualization Acceleration
PDF
Interop Tokyo 2014 -- Mellanox Demonstrations
PDF
InfiniBand Essentials Every HPC Expert Must Know
PDF
Interconnect Your Future With Mellanox
PDF
Become a Supercomputer Hero
PPTX
Interconnect Product Portfolio
PPTX
The Generation of Open Ethernet
PPTX
Interconnect Your Future with Connect-IB
PPTX
MetroX™ – Mellanox Long Haul Solutions
PDF
Unified Fabric Manager - HP Insight CMU Connector
PDF
Print 'N Fly - SC13
PPTX
Mellanox 2013 Analyst Day
PPTX
Interconnect Your Future
PPTX
Mellanox's Technological Advantage
InfiniBand Growth Trends - TOP500 (July 2015)
Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification
InfiniBand FAQ
InfiniBand Strengthens Leadership as the Interconnect Of Choice
CloudX – Expand Your Cloud into the Future
Mellanox VXLAN Acceleration
Virtualization Acceleration
Interop Tokyo 2014 -- Mellanox Demonstrations
InfiniBand Essentials Every HPC Expert Must Know
Interconnect Your Future With Mellanox
Become a Supercomputer Hero
Interconnect Product Portfolio
The Generation of Open Ethernet
Interconnect Your Future with Connect-IB
MetroX™ – Mellanox Long Haul Solutions
Unified Fabric Manager - HP Insight CMU Connector
Print 'N Fly - SC13
Mellanox 2013 Analyst Day
Interconnect Your Future
Mellanox's Technological Advantage

Recently uploaded (20)

PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Cloud computing and distributed systems.
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPT
Teaching material agriculture food technology
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Machine learning based COVID-19 study performance prediction
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Cloud computing and distributed systems.
Digital-Transformation-Roadmap-for-Companies.pptx
The AUB Centre for AI in Media Proposal.docx
Understanding_Digital_Forensics_Presentation.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Unlocking AI with Model Context Protocol (MCP)
Mobile App Security Testing_ A Comprehensive Guide.pdf
Teaching material agriculture food technology
Chapter 3 Spatial Domain Image Processing.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
NewMind AI Monthly Chronicles - July 2025
Per capita expenditure prediction using model stacking based on satellite ima...
Reach Out and Touch Someone: Haptics and Empathic Computing
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Machine learning based COVID-19 study performance prediction
Diabetes mellitus diagnosis method based random forest with bat algorithm

Mellanox High Performance Networks for Ceph

  • 1. Building World Class Data Centers Mellanox High Performance Networks for Ceph Ceph Day, June 10th, 2014
  • 2. © 2014 Mellanox Technologies 2 Leading Supplier of End-to-End Interconnect Solutions Virtual Protocol Interconnect Storage Front / Back-End Server / Compute Switch / Gateway 56G IB & FCoIB 56G InfiniBand 10/40/56GbE & FCoE 10/40/56GbE Virtual Protocol Interconnect Host/Fabric SoftwareICs Switches/GatewaysAdapter Cards Cables/Modules Comprehensive End-to-End InfiniBand and Ethernet Portfolio Metro / WAN
  • 3. © 2014 Mellanox Technologies 3 The Future Depends on Fastest Interconnects 10Gb/s 40/56Gb/s1Gb/s
  • 4. © 2014 Mellanox Technologies 4 From Scale-Up to Scale-Out Architecture  Only way to support storage capacity growth in a cost-effective manner  We have seen this transition on the compute side in HPC in the early 2000s  Scaling performance linearly requires “seamless connectivity” (ie lossless, high bw, low latency, cpu offloads) Interconnect Capabilities Determine Scale Out Performance
  • 5. © 2014 Mellanox Technologies 5 CEPH and Networks  High performance networks enable maximum cluster availability • Clients, OSD, Monitors and Metadata servers communicate over multiple network layers • Real-time requirements for heartbeat, replication, recovery and re-balancing  Cluster (“backend”) network performance dictates cluster’s performance and scalability • “Network load between Ceph OSD Daemons easily dwarfs the network load between Ceph Clients and the Ceph Storage Cluster” (Ceph Documentation)
  • 6. © 2014 Mellanox Technologies 6 How Customers Deploy CEPH with Mellanox Interconnect  Building Scalable, Performing Storage Solutions • Cluster network @ 40Gb Ethernet • Clients @ 10G/40Gb Ethernet  Directly connect over 500 Client Nodes • Target Retail Cost: US$350/1TB  Scale Out Customers Use SSDs • For OSDs and Journals 8.5PB System Currently Being Deployed
  • 7. © 2014 Mellanox Technologies 7 CEPH Deployment Using 10GbE and 40GbE  Cluster (Private) Network @ 40GbE • Smooth HA, unblocked heartbeats, efficient data balancing  Throughput Clients @ 40GbE • Guaranties line rate for high ingress/egress clients  IOPs Clients @ 10GbE / 40GbE • 100K+ IOPs/Client @4K blocks 20x Higher Throughput , 4x Higher IOPs with 40Gb Ethernet Clients! (http://www.mellanox.com/related-docs/whitepapers/WP_Deploying_Ceph_over_High_Performance_Networks.pdf) Throughput Testing results based on fio benchmark, 8m block, 20GB file,128 parallel jobs, RBD Kernel Driver with Linux Kernel 3.13.3 RHEL 6.3, Ceph 0.72.2 IOPs Testing results based on fio benchmark, 4k block, 20GB file,128 parallel jobs, RBD Kernel Driver with Linux Kernel 3.13.3 RHEL 6.3, Ceph 0.72.2 Cluster Network Admin Node 40GbE Public Network 10GbE/40GBE Ceph Nodes (Monitors, OSDs, MDS) Client Nodes 10GbE/40GbE
  • 8. © 2014 Mellanox Technologies 8 CEPH and Hadoop Co-Exist  Increase Hadoop Cluster Performance  Scale Compute and Storage solutions in Efficient Ways  Mitigate Single Point of Failure Events in Hadoop Architecture Name Node /Job Tracker Data Node Ceph NodeCeph Node Data Node Data Node Ceph Node Admin Node
  • 9. © 2014 Mellanox Technologies 9 I/O Offload Frees Up CPU for Application Processing ~88% CPU Efficiency UserSpaceSystemSpace ~53% CPU Efficiency ~47% CPU Overhead/Idle ~12% CPU Overhead/Idle Without RDMA With RDMA and Offload UserSpaceSystemSpace
  • 10. © 2014 Mellanox Technologies 10  Open source! • https://github.com/accelio/accelio/ && www.accelio.org  Faster RDMA integration to application  Asynchronous  Maximize msg and CPU parallelism  Enable > 10GB/s from single node  Enable < 10usec latency under load  In Next Generation Blueprint (Giant) • http://wiki.ceph.com/Planning/Blueprints/Giant/Accelio_RDMA_Messenger Accelio, High-Performance Reliable Messaging and RPC Library
  • 11. © 2014 Mellanox Technologies 11 Summary  CEPH cluster scalability and availability rely on high performance networks  End to end 40/56 Gb/s transport with full CPU offloads available and being deployed • 100Gb/s around the corner  Stay tuned for the afternoon session by CohortFS on RDMA for CEPH