SlideShare a Scribd company logo
Cassandra -> Scylla
4 Key Use Cases Where Users will See Immediate Benefit
Greg Matza
Which C* Use Cases will See Immediate Benefit With
Scylla?
2
+ Is your Dataset > 10 TB?
+ Do you have > 40k read ops/sec?
+ Is your Application sensitive to Long-tail latency?
+ Do you have a Caching layer in front of Cassandra?
Scylla Supports Huge Datasets
- Amazon’s new i3en instances have up to 60 TB of NVME
- Scylla can use all this disk, with benchmarks of
- 15 hours to add a 45 TB node (10 hrs ingestion + 5 hrs compaction)
- 6 hours to stream a new node - one 45 TB node to two 22.5 TB - (4 hrs streaming + 2 hours cleanup)
- >1 million ops per second per node with 80% cache miss and 99p stable at 2 ms
- Detailed benchmark data here and here
- Cassandra is typically limited to 1-2 TB per node.
Scylla Supports Huge Datasets
- Scenario:
- 40 TB raw data, RF=3, TWCS
- 2 Datacenters
- Total data stored is 40 TB * 3 replicas * 2 DCs = 240 TB
TB/node Node Calculation # nodes Node type Annual cost
per node
Total cost
Cassandra 1.5 240 TB/1.5 TB =
160 nodes
160 i3.2xl $5k $800k
Scylla 45 240 TB/45 TB =
5.3 nodes
6 i3en.24xl $50k $300k
Scylla Supports Heavy Reads
- On i3 hardware, Scylla handles read throughput at approximately the same rate as write
throughput
- Scylla can handle sustained read or write throughput of 10,000 ops/core. (1kb payload, NVME disk)
- Throughput scales linearly with # of cores
- Cassandra typically has per-node limitations on read throughput
- Cassandra can handle sustained read throughput of about 20,000 ops/node (1kb payload, NVME disk)
- Larger core counts, thicker networking, or better I/O do not significantly increase throughput
Scylla Supports Heavy Reads
- Scenario:
- 80k read/sec + 20k writes/sec. 1 TB raw data, RF=3
- Both Scylla and Cassandra are running on i3.4xlarge
- Given RF=3 each application-layer operation is counted as 3 ops against the cluster, as it will act on all 3 replicas
- Total Operations are (80k reads * 3 replicas) + (20k writes * 3 replicas) = 300k ops/sec (240k reads + 60k writes)
Limiting factor
per node
Node calculation # of
nodes
Annual cost
per node
Total cost
Cassandra 20k reads 240k/20k = 12 12 $5k $60k
Scylla 80k ops 300k/80k = 3.75 4 $5k $20k
Long-tail latency sensitive
- Due to Garbage Collection, Compaction, Repair and other operations, Cassandra typically
will have tightly bounded average latency, but 95p or 99p latencies will show regular 5x to
20x spikes
- Scylla has no Garbage Collection, includes its own on-board caching, and actively
manages its own I/O and CPU scheduling. This, among other things, allows it to deliver
tightly bounded 95p, 99p or even max latency.
- I/O and CPU scheduling actively manage tasks in a prioritized manner. So background tasks
like compaction or repair are almost always(*) put behind query or writes.
- *Almost always, because we do have a backpressure mechanism, such that if you are in danger
of losing your node do to OOM or out-of-disk, we will prioritize those tasks needed to save the
node above query.
Long-tail latency sensitive
- Scenario:
- “Customer 360” Use Case
- 3 nodes 8vCPU/64 GB RAM
- 1.4 TB dataset
- 20k reads/sec
- Test run by long-time C* DBA
as part of a Scylla vs.
Cassandra POC
Cassandra’s
latency
Scylla’s latency
Read Latencies, 99p
Cassandra
Scylla
Top 3 North American Telecom
Scylla Does Not Require a Caching Layer
- Read-heavy or Latency-sensitive use cases with Cassandra usually require a Redis, Memcached or
other caching layer to meet those requirements
- Scylla has a built-in caching layer, allowing for easier application-side logic and lower node counts
- no cache invalidation issues
- no cold cache issues
- no try/catch application logic on cache misses
Scylla Does Not Require a Caching Layer
- Scenario:
- Comcast needed <10ms max latency on 200k ops/sec. Balanced Read/Write
- Was implemented in Cassandra with 60 nodes of Varnish (cache) + 600 nodes of Cassandra
- Scylla replaced the entire infrastructure with only 60 nodes
- Case: https://www.scylladb.com/tech-talk/comcast-grow-small-get-big-experiences-with-scylla/
Version Apache Cassandra 2.1.8 Scylla Enterprise 2018.1.11
Data Layer: 600 nodes i3.2xlarge 60 nodes i3.2xlarge
Caching Layer: 60 nodes Varnish m4.4xlarge No caching
OpEx: $3.7 million/yr $328k/yr

More Related Content

PPTX
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
PDF
Tombstones and Compaction
PPTX
Cassandra compaction
PPTX
Understanding AntiEntropy in Cassandra
PPTX
NoSQL Session II
PDF
Life as a GlusterFS Consultant with Ivan Rossi
DOCX
Cloudyn - Multi vendor Cloud management
PDF
Orchestrating Cassandra with Kubernetes: Challenges and Opportunities
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
Tombstones and Compaction
Cassandra compaction
Understanding AntiEntropy in Cassandra
NoSQL Session II
Life as a GlusterFS Consultant with Ivan Rossi
Cloudyn - Multi vendor Cloud management
Orchestrating Cassandra with Kubernetes: Challenges and Opportunities

What's hot (19)

PDF
Cassandra
PPTX
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
PPT
Real world capacity
PDF
How We Use MongoDB in Our Advertising System
PPTX
Learn Cassandra at edureka!
PDF
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
PPTX
Everyday I’m scaling... Cassandra
PDF
DOCX
Cassandra data modelling best practices
PDF
Remora the another asdf.
PDF
Cassandra 3.x et la future 4.0
PDF
Aws meetup (sep 2015) exprimir cada centavo
PPTX
Cassandra database design best practises
PPTX
USING EMC FAST SUITE WITH SYBASE ASE ON EMC VNX STORAGE SYSTEMS
PDF
Apache cassandra - survivre en production
PPTX
Sql Server Best Practices
PDF
Hands on MapR -- Viadea
Cassandra
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
Real world capacity
How We Use MongoDB in Our Advertising System
Learn Cassandra at edureka!
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Everyday I’m scaling... Cassandra
Cassandra data modelling best practices
Remora the another asdf.
Cassandra 3.x et la future 4.0
Aws meetup (sep 2015) exprimir cada centavo
Cassandra database design best practises
USING EMC FAST SUITE WITH SYBASE ASE ON EMC VNX STORAGE SYSTEMS
Apache cassandra - survivre en production
Sql Server Best Practices
Hands on MapR -- Viadea
Ad

Similar to 4 use cases for C* to Scylla (20)

PDF
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
PDF
Scylla db@cassandra meetup, tlv, 2015
PDF
Breakthrough OLAP performance with Cassandra and Spark
PPTX
In-Memory Computing: How, Why? and common Patterns
PDF
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
PDF
QNAP TS-832PX-4G.pdf
PPT
Collaborate07kmohiuddin
PPT
Orcl siebel-sun-s282213-oow2006
PPTX
High performace network of Cloud Native Taiwan User Group
PDF
Cassandra CLuster Management by Japan Cassandra Community
PDF
Introduction to Galera Cluster
PDF
cachegrand: A Take on High Performance Caching
PPTX
Kafka vs kinesis
PPTX
Corralling Big Data at TACC
PDF
Multitenancy: Kafka clusters for everyone at LINE
PDF
Scylla: 1 Million CQL operations per second per server
PDF
Dw tpain - Gordon Klok
PPTX
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!
PDF
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
PDF
Benchmarking Apache Samza: 1.2 million messages per sec per node
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
Scylla db@cassandra meetup, tlv, 2015
Breakthrough OLAP performance with Cassandra and Spark
In-Memory Computing: How, Why? and common Patterns
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
QNAP TS-832PX-4G.pdf
Collaborate07kmohiuddin
Orcl siebel-sun-s282213-oow2006
High performace network of Cloud Native Taiwan User Group
Cassandra CLuster Management by Japan Cassandra Community
Introduction to Galera Cluster
cachegrand: A Take on High Performance Caching
Kafka vs kinesis
Corralling Big Data at TACC
Multitenancy: Kafka clusters for everyone at LINE
Scylla: 1 Million CQL operations per second per server
Dw tpain - Gordon Klok
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Benchmarking Apache Samza: 1.2 million messages per sec per node
Ad

Recently uploaded (20)

PDF
Modernizing your data center with Dell and AMD
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PPTX
Cloud computing and distributed systems.
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
Electronic commerce courselecture one. Pdf
PDF
Advanced IT Governance
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
cuic standard and advanced reporting.pdf
PDF
Advanced Soft Computing BINUS July 2025.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
Modernizing your data center with Dell and AMD
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
The AUB Centre for AI in Media Proposal.docx
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
Cloud computing and distributed systems.
CIFDAQ's Market Insight: SEC Turns Pro Crypto
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
GamePlan Trading System Review: Professional Trader's Honest Take
Electronic commerce courselecture one. Pdf
Advanced IT Governance
Advanced methodologies resolving dimensionality complications for autism neur...
NewMind AI Weekly Chronicles - August'25 Week I
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Dropbox Q2 2025 Financial Results & Investor Presentation
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
cuic standard and advanced reporting.pdf
Advanced Soft Computing BINUS July 2025.pdf
Understanding_Digital_Forensics_Presentation.pptx
Review of recent advances in non-invasive hemoglobin estimation

4 use cases for C* to Scylla

  • 1. Cassandra -> Scylla 4 Key Use Cases Where Users will See Immediate Benefit Greg Matza
  • 2. Which C* Use Cases will See Immediate Benefit With Scylla? 2 + Is your Dataset > 10 TB? + Do you have > 40k read ops/sec? + Is your Application sensitive to Long-tail latency? + Do you have a Caching layer in front of Cassandra?
  • 3. Scylla Supports Huge Datasets - Amazon’s new i3en instances have up to 60 TB of NVME - Scylla can use all this disk, with benchmarks of - 15 hours to add a 45 TB node (10 hrs ingestion + 5 hrs compaction) - 6 hours to stream a new node - one 45 TB node to two 22.5 TB - (4 hrs streaming + 2 hours cleanup) - >1 million ops per second per node with 80% cache miss and 99p stable at 2 ms - Detailed benchmark data here and here - Cassandra is typically limited to 1-2 TB per node.
  • 4. Scylla Supports Huge Datasets - Scenario: - 40 TB raw data, RF=3, TWCS - 2 Datacenters - Total data stored is 40 TB * 3 replicas * 2 DCs = 240 TB TB/node Node Calculation # nodes Node type Annual cost per node Total cost Cassandra 1.5 240 TB/1.5 TB = 160 nodes 160 i3.2xl $5k $800k Scylla 45 240 TB/45 TB = 5.3 nodes 6 i3en.24xl $50k $300k
  • 5. Scylla Supports Heavy Reads - On i3 hardware, Scylla handles read throughput at approximately the same rate as write throughput - Scylla can handle sustained read or write throughput of 10,000 ops/core. (1kb payload, NVME disk) - Throughput scales linearly with # of cores - Cassandra typically has per-node limitations on read throughput - Cassandra can handle sustained read throughput of about 20,000 ops/node (1kb payload, NVME disk) - Larger core counts, thicker networking, or better I/O do not significantly increase throughput
  • 6. Scylla Supports Heavy Reads - Scenario: - 80k read/sec + 20k writes/sec. 1 TB raw data, RF=3 - Both Scylla and Cassandra are running on i3.4xlarge - Given RF=3 each application-layer operation is counted as 3 ops against the cluster, as it will act on all 3 replicas - Total Operations are (80k reads * 3 replicas) + (20k writes * 3 replicas) = 300k ops/sec (240k reads + 60k writes) Limiting factor per node Node calculation # of nodes Annual cost per node Total cost Cassandra 20k reads 240k/20k = 12 12 $5k $60k Scylla 80k ops 300k/80k = 3.75 4 $5k $20k
  • 7. Long-tail latency sensitive - Due to Garbage Collection, Compaction, Repair and other operations, Cassandra typically will have tightly bounded average latency, but 95p or 99p latencies will show regular 5x to 20x spikes - Scylla has no Garbage Collection, includes its own on-board caching, and actively manages its own I/O and CPU scheduling. This, among other things, allows it to deliver tightly bounded 95p, 99p or even max latency. - I/O and CPU scheduling actively manage tasks in a prioritized manner. So background tasks like compaction or repair are almost always(*) put behind query or writes. - *Almost always, because we do have a backpressure mechanism, such that if you are in danger of losing your node do to OOM or out-of-disk, we will prioritize those tasks needed to save the node above query.
  • 8. Long-tail latency sensitive - Scenario: - “Customer 360” Use Case - 3 nodes 8vCPU/64 GB RAM - 1.4 TB dataset - 20k reads/sec - Test run by long-time C* DBA as part of a Scylla vs. Cassandra POC Cassandra’s latency Scylla’s latency Read Latencies, 99p Cassandra Scylla Top 3 North American Telecom
  • 9. Scylla Does Not Require a Caching Layer - Read-heavy or Latency-sensitive use cases with Cassandra usually require a Redis, Memcached or other caching layer to meet those requirements - Scylla has a built-in caching layer, allowing for easier application-side logic and lower node counts - no cache invalidation issues - no cold cache issues - no try/catch application logic on cache misses
  • 10. Scylla Does Not Require a Caching Layer - Scenario: - Comcast needed <10ms max latency on 200k ops/sec. Balanced Read/Write - Was implemented in Cassandra with 60 nodes of Varnish (cache) + 600 nodes of Cassandra - Scylla replaced the entire infrastructure with only 60 nodes - Case: https://www.scylladb.com/tech-talk/comcast-grow-small-get-big-experiences-with-scylla/ Version Apache Cassandra 2.1.8 Scylla Enterprise 2018.1.11 Data Layer: 600 nodes i3.2xlarge 60 nodes i3.2xlarge Caching Layer: 60 nodes Varnish m4.4xlarge No caching OpEx: $3.7 million/yr $328k/yr