SlideShare a Scribd company logo
Felipe Mendes, Solution Architect at ScyllaDB
Beyond Linear Scaling
A New Path for Performance
with ScyllaDB
+ For data-intensive applications that require high
throughput and predictable low latencies
+ Close-to-the-metal design takes full advantage of
modern infrastructure
+ >5x higher throughput
+ >20x lower latency
+ >75% TCO savings
+ Compatible with Apache Cassandra and Amazon
DynamoDB
+ DBaaS/Cloud, Enterprise and Open Source
solutions
The Database for Gamechangers
2
“ScyllaDB stands apart...It’s the rare product
that exceeds my expectations.”
– Martin Heller, InfoWorld contributing editor and reviewer
“For 99.9% of applications, ScyllaDB delivers all the
power a customer will ever need, on workloads that other
databases can’t touch – and at a fraction of the cost of
an in-memory solution.”
– Adrian Bridgewater, Forbes senior contributor
3
+400 Gamechangers Leverage ScyllaDB
Seamless experiences
across content + devices
Digital experiences at
massive scale
Corporate fleet
management
Real-time analytics 2,000,000 SKU -commerce
management
Video recommendation
management
Threat intelligence service
using JanusGraph
Real time fraud detection
across 6M transactions/day
Uber scale, mission critical
chat & messaging app
Network security threat
detection
Power ~50M X1 DVRs with
billions of reqs/day
Precision healthcare via
Edison AI
Inventory hub for retail
operations
Property listings and
updates
Unified ML feature store
across the business
Cryptocurrency exchange
app
Geography-based
recommendations
Global operations- Avon,
Body Shop + more
Predictable performance for
on sale surges
GPS-based exercise
tracking
Serving dynamic live
streams at scale
Powering India's top
social media platform
Personalized
advertising to players
Distribution of game
assets in Unreal Engine
Introductions
Felipe Mendes, Solution Architect at ScyllaDB
+ Published Author on Linux and Databases
+ Helps teams solve their most challenging problems
+ Years of experience with Linux and distributed systems
Agenda
+ (Near) Linear Scaling
+ Enter Real-life
+ ScyllaDB under Load
+ Crafting Your Success
+ Beyond Linear Scaling
6
(Near) Linear Scaling
Why is it important … And when you shouldn't care :-)
7
Linear Speedup
Main goal is to run programs faster
+ To a point…
+ Measured as
+ Reasons for sub-linear speedup:
+ Laws! (Amdahl's, Gustafson-Barsis)
+ Task Management
+ Communication & Synchronization
15.2 Performance in Practice
Ideal, typical, and super-linear speedup curves
Universal Scaling Law
Generalization of Amdahl’s Law discovered by Dr. Neil
Gunther. As number of users (N) increases, the
system throughput (X) will:
+ Enjoy a period of near linear scaling
+ Eventually saturate some resource such that
increasing N doesn’t increase X. This defines
maxX
+ Possibly encounter a coordination cost that
drives down X with further increasing N
Saturation
Region
Linear
Region
Retrograde
Region
maxX
How Optimizely (Safely) Maximizes Database Concurrency
Linear Scaling – Good
Relevant for parallel programming, useful for measuring:
+ Database efficiency
+ Price-performance
+ Scalability
NoSQL Benchmark: MongoDB vs ScyllaDB
9
Doesn't account for:
+ Improvements Over Time
+ Application Semantics
+ Hotspots
+ Scaling Clients
+ Consistent Hashing Uneven Distribution
+ Communication Overhead
More on propagating state (and image credits): Gnutella: an Intro to Gossip
Linear Scaling – Bad
Gossip propagation
10
11
Enter Real-life
Overlooked considerations no one (dares) to tell you ;-)
12
Application Semantics
1,000,000 sensors, representing homes in an area
IOT
Social
DynamoDB: When to Move Out?
13
Consistent Hashing
Exercise: How much more traffic and
load does this node receive?
Alexys Jacob – Leveraging consistent hashing in your python applications
thelastpickle – The Impacts of Changing the Number of VNodes in Apache Cassandra
Avi Kivity's shard simulator
Bad
Better, but not perfect
Adding more nodes won't help
Hotspots
14
How Discord Stores Trillions of Messages
Performance Under Load – Adaptive Concurrency Limits
Challenges:
+ For a system serving X static clients, what's the max
effective concurrency to set on a single client?
+ When scaling clients, how to coordinate them to
avoid overwhelming a group of replicas?
Scaling Clients
15
Discord consistent hash-based routing
DB
Calls
Netflix Adaptive Concurrency
16
ScyllaDB Under Load
Quantifying the Performance Impact of a Shard-per-Core Architecture
17
ScyllaDB Architecture
Dor Laor on P99 CONF: Quantifying the Impact of Shard Per Core Architecture
Linear Scale Ingestion
Constant Time Ingestion
2X 2X 2X 2X 2X
18
2X 2X 2X 2X 2X
“Nodes must be small, in case they
fail”
No they don’t! {Replace, Add, remove} Node at constant time
19
Compaction Scale
2X 2X 2X 2X 2X 20
21
Crafting Your Success
Do's and Don'ts
22
In a Nutshell...
Database Performance At Scale
23
Run Real Tests
Benchmark tools prove you can get there, but:
+ Application semantics are unique
+ Access patterns are unique
+ Real-life tooling is also unique
+ Addressing all corner-cases is time-consuming or even impossible
+ Don't just blindly assume 2x will give you 2x load
24
Eliminate Noise
Avoid large deployments of small nodes
+ Go Big or Go Home!
+ Considerably reduces the overhead associated with
communication & synchronization
+ Less resource overcommitment
+ BUT, keep balance:
+ Account for inevitable failures
+ Leave room for unpredictability
25
Tune the client side
Understand your data flows:
+ Can multiple clients spam a single key?
+ What happens when scaling the number of
clients?
+ How is load balancing achieved?
Power of Two Choices load balancing
P99 CONF – Conquering Load Balancing: Experiences from ScyllaDB Drivers
26
Beyond Linear Scaling
Unveiling Performance Insights
27
ScyllaDB in 2018
Ingestion time – Lower is better
28
ScyllaDB in 2023
4
Ingestion time – Lower is better
29
Linear Scale Ingestion (2023 vs 2018)
Constant Time Ingestion
2X 2X 2X 2X 2X
30
Getting Even Faster
Time to execute, lower is better
31
Going Beyond – Tablets
tablet
tablet
replica
tablet
replica
tablet
replica
replication
metadata:
(per table)
Why ScyllaDB is Moving to a New Replication Algorithm: Tablets
32
Going Beyond – ScyllaDB Enterprise
Throughput – Higher is better
There's much more to performance beyond Linear Scale:
+ Goods and Bads of Linear Scaling
+ Real-life situations impacting linear scalability
+ ScyllaDB Shard-Per-Core Architecture
+ Run Realistic Workloads
+ How ScyllaDB drives the meaning of 'performance'
33
Summary
Q&A
ScyllaDB Cloud
Start free trial
scylladb.com/cloud
Feb 14-15 | VIRTUAL EVENT
scylladb.com/summit
Virtual Workshop
January 25, 2024
scylladb.com/events
Thank you
for joining us today.
@scylladb scylladb/
slack.scylladb.com
@scylladb company/scylladb/
scylladb/

More Related Content

PDF
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdf
PPTX
Overcoming Barriers of Scaling Your Database
PDF
How Optimizely (Safely) Maximizes Database Concurrency.pdf
PDF
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
PDF
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
PDF
Making the Most Out of ScyllaDB's Awesome Concurrency at Optimizely
PDF
ScyllaDB Virtual Workshop
PDF
Getting the most out of ScyllaDB
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdf
Overcoming Barriers of Scaling Your Database
How Optimizely (Safely) Maximizes Database Concurrency.pdf
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Making the Most Out of ScyllaDB's Awesome Concurrency at Optimizely
ScyllaDB Virtual Workshop
Getting the most out of ScyllaDB

Similar to Beyond Linear Scaling: A New Path for Performance with ScyllaDB (20)

PDF
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
PDF
What Developers Need to Unlearn for High Performance NoSQL
PDF
Using ScyllaDB for Real-Time Read-Heavy Workloads.pdf
PDF
Observability Best Practices for Your Cloud DBaaS
PDF
ScyllaDB Virtual Workshop: Getting Started with ScyllaDB 2024
PDF
Observability Best Practices for Your Cloud DBaaS
PDF
Scylla Summit 2016: Scylla at Samsung SDS
PDF
Transforming the Database: Critical Innovations for Performance at Scale
PDF
NoSQL at Scale: Proven Practices & Pitfalls
PDF
Dissecting Real-World Database Performance Dilemmas
PDF
ShareChat’s Path to High-Performance NoSQL with ScyllaDB
PDF
Using ScyllaDB for Extreme Scale Workloads
PPTX
Scaling for Performance
PPTX
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
PPTX
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
PDF
New Ways to Reduce Database Costs with ScyllaDB
PPTX
Real-Time or Analytics Workloads... Why Not Both?
PDF
Using ScyllaDB for Real-Time Write-Heavy Workloads
PDF
Low Latency at Extreme Scale: Proven Practices & Pitfalls
PDF
Data Modeling for Performance Masterclass: Why Data Modeling Matters
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
What Developers Need to Unlearn for High Performance NoSQL
Using ScyllaDB for Real-Time Read-Heavy Workloads.pdf
Observability Best Practices for Your Cloud DBaaS
ScyllaDB Virtual Workshop: Getting Started with ScyllaDB 2024
Observability Best Practices for Your Cloud DBaaS
Scylla Summit 2016: Scylla at Samsung SDS
Transforming the Database: Critical Innovations for Performance at Scale
NoSQL at Scale: Proven Practices & Pitfalls
Dissecting Real-World Database Performance Dilemmas
ShareChat’s Path to High-Performance NoSQL with ScyllaDB
Using ScyllaDB for Extreme Scale Workloads
Scaling for Performance
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
New Ways to Reduce Database Costs with ScyllaDB
Real-Time or Analytics Workloads... Why Not Both?
Using ScyllaDB for Real-Time Write-Heavy Workloads
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Data Modeling for Performance Masterclass: Why Data Modeling Matters
Ad

More from ScyllaDB (20)

PDF
Understanding The True Cost of DynamoDB Webinar
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
PDF
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
PDF
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
PDF
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
PDF
Leading a High-Stakes Database Migration
PDF
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
PDF
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
PDF
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
PDF
ScyllaDB: 10 Years and Beyond by Dor Laor
PDF
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
PDF
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
PDF
Vector Search with ScyllaDB by Szymon Wasik
PDF
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
PDF
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
PDF
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
PDF
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
PDF
Lessons Learned from Building a Serverless Notifications System by Srushith R...
PDF
A Dist Sys Programmer's Journey into AI by Piotr Sarna
PDF
High Availability: Lessons Learned by Paul Preuveneers
Understanding The True Cost of DynamoDB Webinar
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
Leading a High-Stakes Database Migration
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB: 10 Years and Beyond by Dor Laor
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
Vector Search with ScyllaDB by Szymon Wasik
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
Lessons Learned from Building a Serverless Notifications System by Srushith R...
A Dist Sys Programmer's Journey into AI by Piotr Sarna
High Availability: Lessons Learned by Paul Preuveneers
Ad

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
A Presentation on Artificial Intelligence
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
cuic standard and advanced reporting.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Cloud computing and distributed systems.
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Reach Out and Touch Someone: Haptics and Empathic Computing
Agricultural_Statistics_at_a_Glance_2022_0.pdf
MYSQL Presentation for SQL database connectivity
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
A comparative analysis of optical character recognition models for extracting...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
A Presentation on Artificial Intelligence
Diabetes mellitus diagnosis method based random forest with bat algorithm
Spectral efficient network and resource selection model in 5G networks
Review of recent advances in non-invasive hemoglobin estimation
cuic standard and advanced reporting.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Cloud computing and distributed systems.
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf

Beyond Linear Scaling: A New Path for Performance with ScyllaDB

  • 1. Felipe Mendes, Solution Architect at ScyllaDB Beyond Linear Scaling A New Path for Performance with ScyllaDB
  • 2. + For data-intensive applications that require high throughput and predictable low latencies + Close-to-the-metal design takes full advantage of modern infrastructure + >5x higher throughput + >20x lower latency + >75% TCO savings + Compatible with Apache Cassandra and Amazon DynamoDB + DBaaS/Cloud, Enterprise and Open Source solutions The Database for Gamechangers 2 “ScyllaDB stands apart...It’s the rare product that exceeds my expectations.” – Martin Heller, InfoWorld contributing editor and reviewer “For 99.9% of applications, ScyllaDB delivers all the power a customer will ever need, on workloads that other databases can’t touch – and at a fraction of the cost of an in-memory solution.” – Adrian Bridgewater, Forbes senior contributor
  • 3. 3 +400 Gamechangers Leverage ScyllaDB Seamless experiences across content + devices Digital experiences at massive scale Corporate fleet management Real-time analytics 2,000,000 SKU -commerce management Video recommendation management Threat intelligence service using JanusGraph Real time fraud detection across 6M transactions/day Uber scale, mission critical chat & messaging app Network security threat detection Power ~50M X1 DVRs with billions of reqs/day Precision healthcare via Edison AI Inventory hub for retail operations Property listings and updates Unified ML feature store across the business Cryptocurrency exchange app Geography-based recommendations Global operations- Avon, Body Shop + more Predictable performance for on sale surges GPS-based exercise tracking Serving dynamic live streams at scale Powering India's top social media platform Personalized advertising to players Distribution of game assets in Unreal Engine
  • 4. Introductions Felipe Mendes, Solution Architect at ScyllaDB + Published Author on Linux and Databases + Helps teams solve their most challenging problems + Years of experience with Linux and distributed systems
  • 5. Agenda + (Near) Linear Scaling + Enter Real-life + ScyllaDB under Load + Crafting Your Success + Beyond Linear Scaling
  • 6. 6 (Near) Linear Scaling Why is it important … And when you shouldn't care :-)
  • 7. 7 Linear Speedup Main goal is to run programs faster + To a point… + Measured as + Reasons for sub-linear speedup: + Laws! (Amdahl's, Gustafson-Barsis) + Task Management + Communication & Synchronization 15.2 Performance in Practice Ideal, typical, and super-linear speedup curves
  • 8. Universal Scaling Law Generalization of Amdahl’s Law discovered by Dr. Neil Gunther. As number of users (N) increases, the system throughput (X) will: + Enjoy a period of near linear scaling + Eventually saturate some resource such that increasing N doesn’t increase X. This defines maxX + Possibly encounter a coordination cost that drives down X with further increasing N Saturation Region Linear Region Retrograde Region maxX How Optimizely (Safely) Maximizes Database Concurrency
  • 9. Linear Scaling – Good Relevant for parallel programming, useful for measuring: + Database efficiency + Price-performance + Scalability NoSQL Benchmark: MongoDB vs ScyllaDB 9
  • 10. Doesn't account for: + Improvements Over Time + Application Semantics + Hotspots + Scaling Clients + Consistent Hashing Uneven Distribution + Communication Overhead More on propagating state (and image credits): Gnutella: an Intro to Gossip Linear Scaling – Bad Gossip propagation 10
  • 11. 11 Enter Real-life Overlooked considerations no one (dares) to tell you ;-)
  • 12. 12 Application Semantics 1,000,000 sensors, representing homes in an area IOT Social DynamoDB: When to Move Out?
  • 13. 13 Consistent Hashing Exercise: How much more traffic and load does this node receive? Alexys Jacob – Leveraging consistent hashing in your python applications thelastpickle – The Impacts of Changing the Number of VNodes in Apache Cassandra Avi Kivity's shard simulator Bad Better, but not perfect
  • 14. Adding more nodes won't help Hotspots 14
  • 15. How Discord Stores Trillions of Messages Performance Under Load – Adaptive Concurrency Limits Challenges: + For a system serving X static clients, what's the max effective concurrency to set on a single client? + When scaling clients, how to coordinate them to avoid overwhelming a group of replicas? Scaling Clients 15 Discord consistent hash-based routing DB Calls Netflix Adaptive Concurrency
  • 16. 16 ScyllaDB Under Load Quantifying the Performance Impact of a Shard-per-Core Architecture
  • 17. 17 ScyllaDB Architecture Dor Laor on P99 CONF: Quantifying the Impact of Shard Per Core Architecture
  • 18. Linear Scale Ingestion Constant Time Ingestion 2X 2X 2X 2X 2X 18
  • 19. 2X 2X 2X 2X 2X “Nodes must be small, in case they fail” No they don’t! {Replace, Add, remove} Node at constant time 19
  • 20. Compaction Scale 2X 2X 2X 2X 2X 20
  • 22. 22 In a Nutshell... Database Performance At Scale
  • 23. 23 Run Real Tests Benchmark tools prove you can get there, but: + Application semantics are unique + Access patterns are unique + Real-life tooling is also unique + Addressing all corner-cases is time-consuming or even impossible + Don't just blindly assume 2x will give you 2x load
  • 24. 24 Eliminate Noise Avoid large deployments of small nodes + Go Big or Go Home! + Considerably reduces the overhead associated with communication & synchronization + Less resource overcommitment + BUT, keep balance: + Account for inevitable failures + Leave room for unpredictability
  • 25. 25 Tune the client side Understand your data flows: + Can multiple clients spam a single key? + What happens when scaling the number of clients? + How is load balancing achieved? Power of Two Choices load balancing P99 CONF – Conquering Load Balancing: Experiences from ScyllaDB Drivers
  • 26. 26 Beyond Linear Scaling Unveiling Performance Insights
  • 27. 27 ScyllaDB in 2018 Ingestion time – Lower is better
  • 28. 28 ScyllaDB in 2023 4 Ingestion time – Lower is better
  • 29. 29 Linear Scale Ingestion (2023 vs 2018) Constant Time Ingestion 2X 2X 2X 2X 2X
  • 30. 30 Getting Even Faster Time to execute, lower is better
  • 31. 31 Going Beyond – Tablets tablet tablet replica tablet replica tablet replica replication metadata: (per table) Why ScyllaDB is Moving to a New Replication Algorithm: Tablets
  • 32. 32 Going Beyond – ScyllaDB Enterprise Throughput – Higher is better
  • 33. There's much more to performance beyond Linear Scale: + Goods and Bads of Linear Scaling + Real-life situations impacting linear scalability + ScyllaDB Shard-Per-Core Architecture + Run Realistic Workloads + How ScyllaDB drives the meaning of 'performance' 33 Summary
  • 34. Q&A ScyllaDB Cloud Start free trial scylladb.com/cloud Feb 14-15 | VIRTUAL EVENT scylladb.com/summit Virtual Workshop January 25, 2024 scylladb.com/events
  • 35. Thank you for joining us today. @scylladb scylladb/ slack.scylladb.com @scylladb company/scylladb/ scylladb/