SlideShare a Scribd company logo
Cassandra to ScyllaDB
A technical comparison and path to success
Lewis Carr, Senior Director, Product Marketing
Paul Preuveneers, Solution Architecture, Customer Experience
■ Introduction
■ Comparing Cassandra and ScyllaDB
■ Migration Options and Tools
■ Case Studies and Best Practices
■ Conclusion and Extended Q&A
Presentation Agenda
■ Wide column NoSQL key-key-value DB
■ Token ring node structured clusters
■ Tunable consistency and virtual nodes
■ Cassandra Query Language (CQL)
■ Memtable (in-memory) & SSTables (disk)
■ Repair and compaction
■ …
Cassandra & ScyllaDB: Same starting line
High throughput AND low latency delivered predictably, without complexity or
high cost needed for hyperscale applications
■Comcast. 962 Cassandra nodes to only 78 Scylla nodes;
60% savings, reduced latencies 95%
■Discord. 250M users. Saved time, improve consistency, and
reduced downtime compared to C*.
■ Expedia. Moved from C* to ScyllaDB Cloud to avoid Java CG, burst traffic,
high infrastructure costs and infrequent release schedules
■Fanatics. went from 55 nodes to 6 and dramatically reduced their AWS EC2
bill by moving to ScyllaDB
So why are users switching to us?
■ European Ecommerce Platform
■ 14,000,000 monthly users
■ Maintenance operations in Cassandra
→ latency spikes
■ P99 comparison
■ 7X lower response times
with ScyllaDB
■ “Usable” P99 latencies
Consistent, Low Latencies
5
■ C* cannot maintain
low latencies except
at very low throughput
(≤30-40k ops)
■ ScyllaDB can maintain
low latencies for far
greater throughputs
(≤170-180k ops)
ScyllaDB vs. C*: Latency vs Throughput
6
Comparing
Cassandra (C*) and ScyllaDB
Design Tenets for Hyperscale Applications
Design Decision
Point
Cassandra(C*)
Implementation
ScyllaDB
Implementation
Impact of difference in
implementation
Distributed operations
and redundancy
Shared-nothing, leaderless node,
token ring topology
Shard-aware tokens mapped to
shards, in turn, mapped to CPU
cores
Improved granularity of resource
allocation and use
Language
implementation
Java C++
No garbage collection, better real-time
response
Vertical vs. horizontal
scaling
Default scale out Scale up then out
Larger more performant nodes for
higher throughput, lower latency -
extracts more from infra. investment
Cloud compute memory
use
Integrated RAM and disk ops
with compaction & repair
Unified cache
Provided you can allocate sufficient
cache you can eliminate the need for
frontend caches like Redis and often
avoid front-ends like Kafka
Scale Elasticity Serverless
Faster Operations, Tablets
(coming soon) will further
accelerate new node spinup
Speed of scaling up/down and in/out
w/o current cost of serverless
■ Shared-nothing asynchronous operations with pinned resources
■ Scale up before scaling out, take full advantage of the largest VM instances
■ Reduce node sprawl, operational complexity, and intranode latency
■ Completely use what you pay for
ScyllaDB Shard-Per-Core with Seastar delivers
an order of magnitude better performance
21 Node C* Cluster
5 Node
ScyllaDB Cluster
Partitions = 1, 2, 3, 4, 5
2, 4, 5
1, 2, 5
1, 3, 4
1, 3, 4
2, 3, 5
■ Close-to-the-metal architecture
Seastar performance boost alone is 3X
Performance Comparison, Up to 5X faster
ScyllaDB 2024.1 vs 2023.1 vs OSS 5.4 (2022)
Max Throughput, Higher is better
■ Tuning JVM for heap and GC
■ Optimal vNode count setting
■ Combating node sprawl
■ Compaction and repair
■ Identifying hot partitions early
and often
■ Provisioning to account for
lower price performance
C* tuning and optimization can be a challenge
■ No Java! So no, JVM or GC
■ Automated memory, storage and IO
tuning
■ Workload prioritization
■ Automated hot partition avoidance
■ Unified memory cache (Memtable)
■ Automated repair and
compaction (ICS)
■ Automated scale with speed
and granularity
■ Reduced node sprawl reduces
admin and infrastructure costs
ScyllaDB reduces operational complexity, cost, and risk
Migrating from
Cassandra to ScyllaDB
Strategies
■ Offline / Cold Migration
Migration Strategies
Strategies
■ Offline / Cold Migration
■ Online / Hot Migration
Migration Strategies
Strategies
■ Offline / Cold Migration
■ Online / Hot Migration
■ (Data Migration Using Kafka)
Migration Strategies
How to migrate the data
Tools and Techniques
How to migrate the data
■ CQL COPY
Tools and Techniques
How to migrate the data
■ CQL COPY
■ ScyllaDB’s SSTableloader
Tools and Techniques
How to migrate the data
■ CQL COPY
■ ScyllaDB’s SSTableloader
■ Mirror Loader
Tools and Techniques
From the migration guide we have
■ CQL COPY
■ Scylla’s SSTableloader
■ Mirror Loader
■ Spark Migrator
Tools and Techniques
Scylla Apache Spark Migration tool
https://github.com/scylladb/scylla-migrator
Scylla Migration Guide
https://enterprise.docs.scylladb.com/stable/operating-scylla/procedures/cassandra-to-scylla-migration-process.html
Failure handling
■ What should I do if SSTableloader fails?
■ What should I do if an Apache Cassandra node fails?
■ What should I do if a ScyllaDB node fails?
■ How to rollback and start from scratch?
Potential Technical Challenges
Cassandra and ScyllaDB Running in Parallel
■ Live (Hot) Setup
■ Use ScyllaDB Tools
Best Practices
How to perform Live Migration
■ Create the same schema from Apache Cassandra in ScyllaDB
■ Configure your application to perform dual writes (read only from Cassandra)
■ Snapshot the to-be-migrated data from Cassandra
■ Load the SSTable files to ScyllaDB (using the ScyllaDB sstableloader tool)
■ Verification
■ Dual writes and reads, ScyllaDB serves reads
■ Log mismatches, until a minimal data mismatch threshold is reached
■ Apache Cassandra End Of Life
■ ScyllaDB only for reads and writes
Best Practices
■ Online Sports Apparel Powerhouse
■ 2015 move to Cassandra
■ JVM Garbage Collection issues
■ CPU spiking and timeouts
■ Huge costs and huge maintenance overhead
■ Remedy was ScyllaDB
■ Out of total cluster size of 55 Cassandra nodes, Fanatics were able to reduce 43
nodes of Cassandra to 6 nodes of ScyllaDB, dramatically reducing their EC2 bill.
■ “During the peak minute we saw close to 280,000 IOPS… and we had zero timeouts.”
Real World Examples
“Just moving one use case [cart mutations] to ScyllaDB
we got a huge benefit out of it.”
- Niraj Konathi
Director of Platform Engineering
■ C* Challenge: “Volatile Latencies”
■ Inconsistent performance
■ Instability
■ Maintenance overhead
■ 24 nodes of C* = 6 nodes of ScyllaDB.
■ Publish items 5x faster
■ 2.5x lower infrastructure costs
■ 4x node reduction
Real World Examples
Takeaways
29
■ ScyllaDB delivers predictable high performance and low latency
■ ScyllaDB and Cassandra share a large driver and connectors
ecosystem
■ ScyllaDB reduces operational complexity, cost and risk
■ The path for migration is straightforward, low risk, and we're here to
help!
■ Customers met their performance challenges with ScyllaDB
Thanks
Lewis Carr
llewis.carr@scylladb.com
linkedin.com/in/lewiscarr
Paul Preuveneers
paul.preuveneers@scylladb.com
linkedin.com/in/paulpreuveneers

More Related Content

PDF
How to achieve no compromise performance and availability
PDF
Using ScyllaDB for Extreme Scale Workloads
PDF
How Development Teams Cut Costs with ScyllaDB.pdf
PPTX
Cassandra vs. ScyllaDB: Evolutionary Differences
PPTX
Apache Cassandra Lunch #74: ScyllaDB - Peter Corless
PDF
Introducing Scylla Cloud
PPTX
4 use cases for C* to Scylla
PDF
The True Cost of NoSQL DBaaS Options
How to achieve no compromise performance and availability
Using ScyllaDB for Extreme Scale Workloads
How Development Teams Cut Costs with ScyllaDB.pdf
Cassandra vs. ScyllaDB: Evolutionary Differences
Apache Cassandra Lunch #74: ScyllaDB - Peter Corless
Introducing Scylla Cloud
4 use cases for C* to Scylla
The True Cost of NoSQL DBaaS Options

Similar to Cassandra to ScyllaDB: Technical Comparison and the Path to Success (20)

PDF
Scylla db deck, july 2017
PDF
5 Factors When Selecting a High Performance, Low Latency Database
PPTX
Why We Chose ScyllaDB over DynamoDB for "User Watch Status"
PPTX
Scylla Summit 2018: Cassandra and ScyllaDB at Yahoo! Japan
PDF
Measuring Database Performance on Bare Metal AWS Instances
PDF
ScyllaDB Virtual Workshop: Getting Started with ScyllaDB 2024
PDF
Elasticity, Speed & Simplicity: Get the Most Out of New ScyllaDB Capabilities
PDF
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
PDF
To Serverless and Beyond
PPTX
mParticle's Journey to Scylla from Cassandra
PDF
ScyllaDB is No Longer "Just a Faster Cassandra" by Felipe Cardeneti Mendes
PDF
How to Monitor and Size Workloads on AWS i3 instances
PDF
Addressing the High Cost of Apache Cassandra
PDF
Introducing Scylla Open Source 4.0
PPTX
DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
PDF
ScyllaDB Virtual Workshop
PDF
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
PPTX
ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDB
PPTX
Scylla Summit 2018: Kiwi.com Migration to Scylla - The Why, the How, the Fail...
PDF
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB
Scylla db deck, july 2017
5 Factors When Selecting a High Performance, Low Latency Database
Why We Chose ScyllaDB over DynamoDB for "User Watch Status"
Scylla Summit 2018: Cassandra and ScyllaDB at Yahoo! Japan
Measuring Database Performance on Bare Metal AWS Instances
ScyllaDB Virtual Workshop: Getting Started with ScyllaDB 2024
Elasticity, Speed & Simplicity: Get the Most Out of New ScyllaDB Capabilities
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
To Serverless and Beyond
mParticle's Journey to Scylla from Cassandra
ScyllaDB is No Longer "Just a Faster Cassandra" by Felipe Cardeneti Mendes
How to Monitor and Size Workloads on AWS i3 instances
Addressing the High Cost of Apache Cassandra
Introducing Scylla Open Source 4.0
DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
ScyllaDB Virtual Workshop
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDB
Scylla Summit 2018: Kiwi.com Migration to Scylla - The Why, the How, the Fail...
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB
Ad

More from ScyllaDB (20)

PDF
Understanding The True Cost of DynamoDB Webinar
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
PDF
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
PDF
New Ways to Reduce Database Costs with ScyllaDB
PDF
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
PDF
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
PDF
Leading a High-Stakes Database Migration
PDF
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
PDF
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
PDF
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
PDF
ScyllaDB: 10 Years and Beyond by Dor Laor
PDF
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
PDF
Vector Search with ScyllaDB by Szymon Wasik
PDF
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
PDF
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
PDF
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
PDF
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
PDF
Lessons Learned from Building a Serverless Notifications System by Srushith R...
PDF
A Dist Sys Programmer's Journey into AI by Piotr Sarna
PDF
High Availability: Lessons Learned by Paul Preuveneers
Understanding The True Cost of DynamoDB Webinar
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
New Ways to Reduce Database Costs with ScyllaDB
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
Leading a High-Stakes Database Migration
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB: 10 Years and Beyond by Dor Laor
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
Vector Search with ScyllaDB by Szymon Wasik
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
Lessons Learned from Building a Serverless Notifications System by Srushith R...
A Dist Sys Programmer's Journey into AI by Piotr Sarna
High Availability: Lessons Learned by Paul Preuveneers
Ad

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Approach and Philosophy of On baking technology
PDF
Machine learning based COVID-19 study performance prediction
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
MYSQL Presentation for SQL database connectivity
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Modernizing your data center with Dell and AMD
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
A Presentation on Artificial Intelligence
Encapsulation_ Review paper, used for researhc scholars
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Per capita expenditure prediction using model stacking based on satellite ima...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
The AUB Centre for AI in Media Proposal.docx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Approach and Philosophy of On baking technology
Machine learning based COVID-19 study performance prediction
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Unlocking AI with Model Context Protocol (MCP)
MYSQL Presentation for SQL database connectivity
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Modernizing your data center with Dell and AMD
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Understanding_Digital_Forensics_Presentation.pptx
A Presentation on Artificial Intelligence

Cassandra to ScyllaDB: Technical Comparison and the Path to Success

  • 1. Cassandra to ScyllaDB A technical comparison and path to success Lewis Carr, Senior Director, Product Marketing Paul Preuveneers, Solution Architecture, Customer Experience
  • 2. ■ Introduction ■ Comparing Cassandra and ScyllaDB ■ Migration Options and Tools ■ Case Studies and Best Practices ■ Conclusion and Extended Q&A Presentation Agenda
  • 3. ■ Wide column NoSQL key-key-value DB ■ Token ring node structured clusters ■ Tunable consistency and virtual nodes ■ Cassandra Query Language (CQL) ■ Memtable (in-memory) & SSTables (disk) ■ Repair and compaction ■ … Cassandra & ScyllaDB: Same starting line
  • 4. High throughput AND low latency delivered predictably, without complexity or high cost needed for hyperscale applications ■Comcast. 962 Cassandra nodes to only 78 Scylla nodes; 60% savings, reduced latencies 95% ■Discord. 250M users. Saved time, improve consistency, and reduced downtime compared to C*. ■ Expedia. Moved from C* to ScyllaDB Cloud to avoid Java CG, burst traffic, high infrastructure costs and infrequent release schedules ■Fanatics. went from 55 nodes to 6 and dramatically reduced their AWS EC2 bill by moving to ScyllaDB So why are users switching to us?
  • 5. ■ European Ecommerce Platform ■ 14,000,000 monthly users ■ Maintenance operations in Cassandra → latency spikes ■ P99 comparison ■ 7X lower response times with ScyllaDB ■ “Usable” P99 latencies Consistent, Low Latencies 5
  • 6. ■ C* cannot maintain low latencies except at very low throughput (≤30-40k ops) ■ ScyllaDB can maintain low latencies for far greater throughputs (≤170-180k ops) ScyllaDB vs. C*: Latency vs Throughput 6
  • 8. Design Tenets for Hyperscale Applications Design Decision Point Cassandra(C*) Implementation ScyllaDB Implementation Impact of difference in implementation Distributed operations and redundancy Shared-nothing, leaderless node, token ring topology Shard-aware tokens mapped to shards, in turn, mapped to CPU cores Improved granularity of resource allocation and use Language implementation Java C++ No garbage collection, better real-time response Vertical vs. horizontal scaling Default scale out Scale up then out Larger more performant nodes for higher throughput, lower latency - extracts more from infra. investment Cloud compute memory use Integrated RAM and disk ops with compaction & repair Unified cache Provided you can allocate sufficient cache you can eliminate the need for frontend caches like Redis and often avoid front-ends like Kafka Scale Elasticity Serverless Faster Operations, Tablets (coming soon) will further accelerate new node spinup Speed of scaling up/down and in/out w/o current cost of serverless
  • 9. ■ Shared-nothing asynchronous operations with pinned resources ■ Scale up before scaling out, take full advantage of the largest VM instances ■ Reduce node sprawl, operational complexity, and intranode latency ■ Completely use what you pay for ScyllaDB Shard-Per-Core with Seastar delivers an order of magnitude better performance 21 Node C* Cluster 5 Node ScyllaDB Cluster Partitions = 1, 2, 3, 4, 5 2, 4, 5 1, 2, 5 1, 3, 4 1, 3, 4 2, 3, 5 ■ Close-to-the-metal architecture
  • 12. ScyllaDB 2024.1 vs 2023.1 vs OSS 5.4 (2022) Max Throughput, Higher is better
  • 13. ■ Tuning JVM for heap and GC ■ Optimal vNode count setting ■ Combating node sprawl ■ Compaction and repair ■ Identifying hot partitions early and often ■ Provisioning to account for lower price performance C* tuning and optimization can be a challenge
  • 14. ■ No Java! So no, JVM or GC ■ Automated memory, storage and IO tuning ■ Workload prioritization ■ Automated hot partition avoidance ■ Unified memory cache (Memtable) ■ Automated repair and compaction (ICS) ■ Automated scale with speed and granularity ■ Reduced node sprawl reduces admin and infrastructure costs ScyllaDB reduces operational complexity, cost, and risk
  • 16. Strategies ■ Offline / Cold Migration Migration Strategies
  • 17. Strategies ■ Offline / Cold Migration ■ Online / Hot Migration Migration Strategies
  • 18. Strategies ■ Offline / Cold Migration ■ Online / Hot Migration ■ (Data Migration Using Kafka) Migration Strategies
  • 19. How to migrate the data Tools and Techniques
  • 20. How to migrate the data ■ CQL COPY Tools and Techniques
  • 21. How to migrate the data ■ CQL COPY ■ ScyllaDB’s SSTableloader Tools and Techniques
  • 22. How to migrate the data ■ CQL COPY ■ ScyllaDB’s SSTableloader ■ Mirror Loader Tools and Techniques
  • 23. From the migration guide we have ■ CQL COPY ■ Scylla’s SSTableloader ■ Mirror Loader ■ Spark Migrator Tools and Techniques Scylla Apache Spark Migration tool https://github.com/scylladb/scylla-migrator Scylla Migration Guide https://enterprise.docs.scylladb.com/stable/operating-scylla/procedures/cassandra-to-scylla-migration-process.html
  • 24. Failure handling ■ What should I do if SSTableloader fails? ■ What should I do if an Apache Cassandra node fails? ■ What should I do if a ScyllaDB node fails? ■ How to rollback and start from scratch? Potential Technical Challenges
  • 25. Cassandra and ScyllaDB Running in Parallel ■ Live (Hot) Setup ■ Use ScyllaDB Tools Best Practices
  • 26. How to perform Live Migration ■ Create the same schema from Apache Cassandra in ScyllaDB ■ Configure your application to perform dual writes (read only from Cassandra) ■ Snapshot the to-be-migrated data from Cassandra ■ Load the SSTable files to ScyllaDB (using the ScyllaDB sstableloader tool) ■ Verification ■ Dual writes and reads, ScyllaDB serves reads ■ Log mismatches, until a minimal data mismatch threshold is reached ■ Apache Cassandra End Of Life ■ ScyllaDB only for reads and writes Best Practices
  • 27. ■ Online Sports Apparel Powerhouse ■ 2015 move to Cassandra ■ JVM Garbage Collection issues ■ CPU spiking and timeouts ■ Huge costs and huge maintenance overhead ■ Remedy was ScyllaDB ■ Out of total cluster size of 55 Cassandra nodes, Fanatics were able to reduce 43 nodes of Cassandra to 6 nodes of ScyllaDB, dramatically reducing their EC2 bill. ■ “During the peak minute we saw close to 280,000 IOPS… and we had zero timeouts.” Real World Examples “Just moving one use case [cart mutations] to ScyllaDB we got a huge benefit out of it.” - Niraj Konathi Director of Platform Engineering
  • 28. ■ C* Challenge: “Volatile Latencies” ■ Inconsistent performance ■ Instability ■ Maintenance overhead ■ 24 nodes of C* = 6 nodes of ScyllaDB. ■ Publish items 5x faster ■ 2.5x lower infrastructure costs ■ 4x node reduction Real World Examples
  • 29. Takeaways 29 ■ ScyllaDB delivers predictable high performance and low latency ■ ScyllaDB and Cassandra share a large driver and connectors ecosystem ■ ScyllaDB reduces operational complexity, cost and risk ■ The path for migration is straightforward, low risk, and we're here to help! ■ Customers met their performance challenges with ScyllaDB