SlideShare a Scribd company logo
SSDs, IMDGs and All the Rest 
A short intro into how SSDs are 
powering the data revolution 
Uri Cohen 
Head of Product @ GigaSpaces 
@uri1803 
#jaxlondon 2014
The Data Processing Hierarchy
But Data Amounts Just Keep Growing
But We Have a Performance Gap
In Memory 
Computing 
to the 
Rescue? 
Not enough anymore… 
• Average GigaSpaces XAP 
cluster size grew 5-10 fold 
since 2008 
• We’re in the realm of 
terabytes, not gigabytes
SSD to Save 
the Day! 
https://www.mimoco.com
(It Actually 
Looks More 
Like This)
Some Numbers 
Level Access time Typical size 
Registers instantaneous under 1KB 
Level 1 Cache 1-3 ns 64KB per core 
Level 2 Cache 3-10 ns 256KB per core 
Level 3 Cache 10-20 ns 2-20 MB per chip 
Main Memory 30-60 ns 4-32 GB per system 
Hard Disk 3,000,000-10,000,000 ns over 1TB
Some Numbers 
Level Random Access Time Typical Size 
Registers instantaneous under 1KB 
Level 1 Cache 1-3 ns 64KB per core 
Level 2 Cache 3-10 ns 256KB per core 
Level 3 Cache 10-20 ns 2-20 MB per chip 
Main Memory 30-60 ns 4-32 GB per system 
SSD < 1,000,000 ns 128GB – 2TB 
Hard Disk 3,000,000-10,000,000 ns over 1TB
Performance Is All the Rage 
http://arstechnica.com/information-technology/2012/06/inside-the-ssd-revolution-how-solid-state-disks-really-work/
Is It All Roses 
and Daisies?
Step Back – 
How SSDs 
Work
The Foundation - NAND Chips
NAND Traits 
Space-efficient 
(60% less than NOR) 
 Effectively only 
NAND is used 
commercially
NAND Traits 
Can only write and 
read whole pages, 
4096 or 8192 bytes 
at a time 
 Modern FSs work 
this way anyway (but 
keep that in mind for 
later)
NAND Traits 
Limited life span 
(5K-10K write/erase 
cycles) 
 Need to evenly 
distribute load across 
all blocks
NAND Traits 
You cannot update 
a page “in place” 
 So why not delete 
it and write a new one 
instead?
Duh, you can 
only delete 
whole blocks
Typical Update Cycle
Typical 
Update Cycle 
• Updating 4096 
(or less) bytes of 
data can result in 
2MB of data 
moving around on 
the SSD 
• It’s called 
Write Amplification
Controllers 
to the 
Rescue
Write Caching
Garbage 
Collection 
(Grrrrrr….) 
Compacts 
fragmented disk 
blocks  but has a 
performance cost 
• Modern SSDs try to do 
this in the 
background... 
• When no empty blocks 
are available, GC must 
be done before ANY 
write can go through
Striping
Wear 
Leveling 
A bag of techniques 
the controller uses 
to keep all of the 
flash cells at roughly 
the same level of 
use
Dedupe & Compression
Databases, 
Charge 
Ahead! 
http://cdn.pcworld.idg.com.au/article/images/740x500/dimg/larry-mario_500.jpg
The Naive - 
MySQL (or 
PostgreSQL, 
Oracle, 
Mongo, …) 
Let’s just use it! 
(and write data 
in place FTW)
The Naive - 
MySQL (or 
PostgreSQL, 
Oracle, 
Mongo, …) 
• They all perform 
buffering of 
writes before 
flushing to disk 
• ... but flushes 
are still 
RANDOM writes
Source: Anandtech
Source: Anandtech
Cassandra 
Already 
Optimized 
(But for 
what?)
Cassandra Write Path 
http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives
Cassandra Write Path 
http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives
Cassandra Write Path 
http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives
Cassandra Write Path 
http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives
C* 
Observations 
(for SSDs) 
• All disk writes are 
sequential and append 
only 
• Compaction is applied 
when merging SSTables 
• SSTables are immutable 
once written 
 No write 
amplification
But Still… 
• Read path is 
complex 
• Compaction can 
cause performance 
variations
Why DO WE 
Treat SSDs 
the Same as 
HDDs?
Software 
Optimizations 
Direct access: 
• No kernel space 
overhead 
• TRIM 
• Multithreading 
• Caching in DRAM 
• On Disk and 
DRAM Indexing
Flash 
Optimized 
APIs
How We Did It
43 
RAM Only : ~1M read Txns/sec 
RAM + SSD: 242K read Txns/sec 
Raw Performance Numbers
Looking at It from a Cost Perspective 
44 
While Reducing Servers by 50% 
Provides 2x – 3.6x Better TPS/$ 
- 1KB object size and uniform distribution 
- 2 sockets 2.8GHz CPU with total 24 cores, CentOS 5.8, 2 FusionIO SLC PCIe cards RAID 
- YCSB measurements performed by SanDisk 
Assumptions: 1TB Flash = $2K; 1TB RAM = $20K
Resources 
• http://arstechnica.com/information-technology/ 
2012/06/inside-the-ssd-revolution-how-solid-state- 
disks-really-work/ 
• http://www.slideshare.net/rbranson/cassandra-and-solid-state- 
drives 
• http://www.sandisk.com/enterprise/zetascale/ 
• http://www.gigaspaces.com/xap-memoryxtend-flash-performance- 
big-data
Thank You!

More Related Content

PDF
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
PDF
Scaling Cassandra for Big Data
PDF
Optimizing MongoDB: Lessons Learned at Localytics
PDF
MyRocks Deep Dive
PPTX
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
PPT
Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...
PDF
CaSSanDra: An SSD Boosted Key-Value Store
PDF
FlashSQL 소개 & TechTalk
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Scaling Cassandra for Big Data
Optimizing MongoDB: Lessons Learned at Localytics
MyRocks Deep Dive
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...
CaSSanDra: An SSD Boosted Key-Value Store
FlashSQL 소개 & TechTalk

What's hot (20)

PDF
MongoDB and server performance
PDF
Ndb cluster 80_ycsb_disk
PDF
San Francisco Cassadnra Meetup - March 2014: I/O Performance tuning on AWS fo...
PDF
Update on Crimson - the Seastarized Ceph - Seastar Summit
PDF
Using ZFS file system with MySQL
PDF
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
PDF
MongoDB memory management demystified
PDF
92 grand prix_2013
PDF
strangeloop 2012 apache cassandra anti patterns
PDF
Cassandra On EC2
PDF
Hybrid Storage Pools (Now with the benefit of hindsight!)
PPTX
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
PDF
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
PDF
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
ODP
Exploiting Your File System to Build Robust & Efficient Workflows
PDF
What every developer should know about database scalability, PyCon 2010
PPTX
The Hive Think Tank: Rocking the Database World with RocksDB
PDF
Redis acc 2015_eng
PDF
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...
PDF
Cassandra Anti-Patterns
MongoDB and server performance
Ndb cluster 80_ycsb_disk
San Francisco Cassadnra Meetup - March 2014: I/O Performance tuning on AWS fo...
Update on Crimson - the Seastarized Ceph - Seastar Summit
Using ZFS file system with MySQL
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
MongoDB memory management demystified
92 grand prix_2013
strangeloop 2012 apache cassandra anti patterns
Cassandra On EC2
Hybrid Storage Pools (Now with the benefit of hindsight!)
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
Exploiting Your File System to Build Robust & Efficient Workflows
What every developer should know about database scalability, PyCon 2010
The Hive Think Tank: Rocking the Database World with RocksDB
Redis acc 2015_eng
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...
Cassandra Anti-Patterns
Ad

Similar to SSDs, IMDGs and All the Rest - Jax London (20)

PPTX
SSD-Bondi.pptx
PPTX
Accelerating hbase with nvme and bucket cache
PPTX
Open Source Data Deduplication
PPT
Solid state drives
PPTX
Deploying ssd in the data center 2014
PPT
SSD PPT BY SAURABH
PDF
Accelerating HBase with NVMe and Bucket Cache
PPTX
2015 deploying flash in the data center
PPTX
2015 deploying flash in the data center
PPTX
Design Tradeoffs for SSD Performance
PDF
Storage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
PDF
Why does my choice of storage matter with cassandra?
PDF
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
PDF
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
PDF
Nachos 2
PDF
Nachos 2
PDF
DataStax: Extreme Cassandra Optimization: The Sequel
PDF
Presentation database on flash
PPTX
San presentation nov 2012 central pa
PDF
CLFS 2010
SSD-Bondi.pptx
Accelerating hbase with nvme and bucket cache
Open Source Data Deduplication
Solid state drives
Deploying ssd in the data center 2014
SSD PPT BY SAURABH
Accelerating HBase with NVMe and Bucket Cache
2015 deploying flash in the data center
2015 deploying flash in the data center
Design Tradeoffs for SSD Performance
Storage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
Why does my choice of storage matter with cassandra?
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Nachos 2
Nachos 2
DataStax: Extreme Cassandra Optimization: The Sequel
Presentation database on flash
San presentation nov 2012 central pa
CLFS 2010
Ad

More from Uri Cohen (20)

PPTX
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
PPTX
Cloudify workshop at CCCEU 2014
PPTX
Alef event - going open source
PPTX
GigaSpaces XAP for Financial Services
PPTX
In Memory Data Grids, Demystified!
PPTX
App Centric Devops - CloudStack 2014 Collaboration Conference #CCNA14
PPTX
Its the app stupid - CloudStack 2014 Collaboration Conference #CCNA14
PPTX
Deployment Automation on OpenStack with TOSCA and Cloudify
PPTX
Cloud stack collabiration conference - It's the app, stupid!
PPTX
Changing organizational culture - a sweaty usecase
PPTX
GigaSpaces XAP - Don't Call Me Cache!
PPTX
Oscon 2013 - Lessons from building an open source community
PPTX
Oscon 2013 -Your OSS Project Is now served
PPTX
OpenStack Israel Summit 2013 - It’s the App, Stupid!
PPTX
One Does Not Simply Walk Into Devops
PPTX
MongoDB in the Clouds
PPTX
Carrier Paas - CloudStack Collaboration Event 2012
PPTX
Your Apps on the Cloud - What it really takes
PPTX
Cassandra summit - Big Data Apps on the cloud
PPTX
Trade and Event Processing at a Massive Scale - QCon NY 2012
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
Cloudify workshop at CCCEU 2014
Alef event - going open source
GigaSpaces XAP for Financial Services
In Memory Data Grids, Demystified!
App Centric Devops - CloudStack 2014 Collaboration Conference #CCNA14
Its the app stupid - CloudStack 2014 Collaboration Conference #CCNA14
Deployment Automation on OpenStack with TOSCA and Cloudify
Cloud stack collabiration conference - It's the app, stupid!
Changing organizational culture - a sweaty usecase
GigaSpaces XAP - Don't Call Me Cache!
Oscon 2013 - Lessons from building an open source community
Oscon 2013 -Your OSS Project Is now served
OpenStack Israel Summit 2013 - It’s the App, Stupid!
One Does Not Simply Walk Into Devops
MongoDB in the Clouds
Carrier Paas - CloudStack Collaboration Event 2012
Your Apps on the Cloud - What it really takes
Cassandra summit - Big Data Apps on the cloud
Trade and Event Processing at a Massive Scale - QCon NY 2012

Recently uploaded (20)

PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Big Data Technologies - Introduction.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Cloud computing and distributed systems.
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Network Security Unit 5.pdf for BCA BBA.
Per capita expenditure prediction using model stacking based on satellite ima...
Machine learning based COVID-19 study performance prediction
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Unlocking AI with Model Context Protocol (MCP)
Building Integrated photovoltaic BIPV_UPV.pdf
Big Data Technologies - Introduction.pptx
The AUB Centre for AI in Media Proposal.docx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Cloud computing and distributed systems.
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Understanding_Digital_Forensics_Presentation.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation

SSDs, IMDGs and All the Rest - Jax London

  • 1. SSDs, IMDGs and All the Rest A short intro into how SSDs are powering the data revolution Uri Cohen Head of Product @ GigaSpaces @uri1803 #jaxlondon 2014
  • 3. But Data Amounts Just Keep Growing
  • 4. But We Have a Performance Gap
  • 5. In Memory Computing to the Rescue? Not enough anymore… • Average GigaSpaces XAP cluster size grew 5-10 fold since 2008 • We’re in the realm of terabytes, not gigabytes
  • 6. SSD to Save the Day! https://www.mimoco.com
  • 7. (It Actually Looks More Like This)
  • 8. Some Numbers Level Access time Typical size Registers instantaneous under 1KB Level 1 Cache 1-3 ns 64KB per core Level 2 Cache 3-10 ns 256KB per core Level 3 Cache 10-20 ns 2-20 MB per chip Main Memory 30-60 ns 4-32 GB per system Hard Disk 3,000,000-10,000,000 ns over 1TB
  • 9. Some Numbers Level Random Access Time Typical Size Registers instantaneous under 1KB Level 1 Cache 1-3 ns 64KB per core Level 2 Cache 3-10 ns 256KB per core Level 3 Cache 10-20 ns 2-20 MB per chip Main Memory 30-60 ns 4-32 GB per system SSD < 1,000,000 ns 128GB – 2TB Hard Disk 3,000,000-10,000,000 ns over 1TB
  • 10. Performance Is All the Rage http://arstechnica.com/information-technology/2012/06/inside-the-ssd-revolution-how-solid-state-disks-really-work/
  • 11. Is It All Roses and Daisies?
  • 12. Step Back – How SSDs Work
  • 13. The Foundation - NAND Chips
  • 14. NAND Traits Space-efficient (60% less than NOR)  Effectively only NAND is used commercially
  • 15. NAND Traits Can only write and read whole pages, 4096 or 8192 bytes at a time  Modern FSs work this way anyway (but keep that in mind for later)
  • 16. NAND Traits Limited life span (5K-10K write/erase cycles)  Need to evenly distribute load across all blocks
  • 17. NAND Traits You cannot update a page “in place”  So why not delete it and write a new one instead?
  • 18. Duh, you can only delete whole blocks
  • 20. Typical Update Cycle • Updating 4096 (or less) bytes of data can result in 2MB of data moving around on the SSD • It’s called Write Amplification
  • 23. Garbage Collection (Grrrrrr….) Compacts fragmented disk blocks  but has a performance cost • Modern SSDs try to do this in the background... • When no empty blocks are available, GC must be done before ANY write can go through
  • 25. Wear Leveling A bag of techniques the controller uses to keep all of the flash cells at roughly the same level of use
  • 27. Databases, Charge Ahead! http://cdn.pcworld.idg.com.au/article/images/740x500/dimg/larry-mario_500.jpg
  • 28. The Naive - MySQL (or PostgreSQL, Oracle, Mongo, …) Let’s just use it! (and write data in place FTW)
  • 29. The Naive - MySQL (or PostgreSQL, Oracle, Mongo, …) • They all perform buffering of writes before flushing to disk • ... but flushes are still RANDOM writes
  • 32. Cassandra Already Optimized (But for what?)
  • 33. Cassandra Write Path http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives
  • 34. Cassandra Write Path http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives
  • 35. Cassandra Write Path http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives
  • 36. Cassandra Write Path http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives
  • 37. C* Observations (for SSDs) • All disk writes are sequential and append only • Compaction is applied when merging SSTables • SSTables are immutable once written  No write amplification
  • 38. But Still… • Read path is complex • Compaction can cause performance variations
  • 39. Why DO WE Treat SSDs the Same as HDDs?
  • 40. Software Optimizations Direct access: • No kernel space overhead • TRIM • Multithreading • Caching in DRAM • On Disk and DRAM Indexing
  • 43. 43 RAM Only : ~1M read Txns/sec RAM + SSD: 242K read Txns/sec Raw Performance Numbers
  • 44. Looking at It from a Cost Perspective 44 While Reducing Servers by 50% Provides 2x – 3.6x Better TPS/$ - 1KB object size and uniform distribution - 2 sockets 2.8GHz CPU with total 24 cores, CentOS 5.8, 2 FusionIO SLC PCIe cards RAID - YCSB measurements performed by SanDisk Assumptions: 1TB Flash = $2K; 1TB RAM = $20K
  • 45. Resources • http://arstechnica.com/information-technology/ 2012/06/inside-the-ssd-revolution-how-solid-state- disks-really-work/ • http://www.slideshare.net/rbranson/cassandra-and-solid-state- drives • http://www.sandisk.com/enterprise/zetascale/ • http://www.gigaspaces.com/xap-memoryxtend-flash-performance- big-data

Editor's Notes

  • #20: Updating 4096 bytes of data can result in 2MB of data being removed and rewritten
  • #21: Updating 4096 bytes of data can result in 2MB of data being removed and rewritten
  • #26: Increases write amplification
  • #27: Mention SandForce Compress, check for dups, discard Updates to a file cause a lot less writes Can also span across file
  • #44: Uri
  • #45: Uri