SlideShare a Scribd company logo
Brought to you by
Avoiding Data Hotspots
At Scale
Konstantin Osipov
Engineering at
Konstantin Osipov
Director of Engineering
■ Worked on lightweight transactions in Scylla
■ Rarely happy with the status quo (AKA the stubborn one)
■ A very happy father
■ Career and public speaking coach
RUM conjecture and scalability
What this talk is not
● replication
● Re-sharding and re-balancing data
● distributed queries & jobs
will focus on principles data distribution only
Ways to shard
Define sharding
Sharding - horizontal partitioning of data across multiple servers. Can be used to
scale capacity and (possibly) throughput of the database. 3 key challenges:
● Choosing a way to split data across nodes
● Re-balancing data and maintaining location information
● Routing queries to the data
Hash based sharding
Hash
ring
Hashed keys
Consistent hash Ketama hash
Sharding: hash + virtual buckets in Couchbase
Sharding: chunk splits and migrations in
MongoDB
Hotspots
Range based sharding
Sharding: ranges in CockroachDB
mongodb
For queries that don’t include the shard key, mongos must query all shards, wait
for their response and then return the result to the application. These
“scatter/gather” queries can be long running operations.
However, range based partitioning can result in an uneven distribution of data,
which may negate some of the benefits of sharding. For example, if the shard key
is a linearly increasing field, such as time, then all requests for a given time range
will map to the same chunk, and thus the same shard. In this situation, a small set
of shards may receive the majority of requests and the system would not scale
very well.
spanner
One cause of hotspots is having a column whose value monotonically increases
as the first key part, because this results in all inserts occurring at the end of your
key space. This pattern is undesirable because Cloud Spanner divides data among
servers by key ranges, which means all your inserts will be directed at a single
server that will end up doing all the work.
Avoiding hotspots
Bit-reversing the partition key
Descending order for timestamp-based keys
CREATE TABLE UserAccessLog (
UserId INT64 NOT NULL,
LastAccess TIMESTAMP NOT NULL,
...
) PRIMARY KEY (UserId, LastAccess DESC);
Replicating dimension tables everywhere
voltdb
To further optimize performance, VoltDB allows selected tables to be replicated
on all partitions of the cluster. This strategy minimizes cross-partition join
operations. For example, a retail merchandising database that uses product codes
as the primary key may have one table that simply correlates the product code
with the product's category and full name. Since this table is relatively small and
does not change frequently (unlike inventory and orders) it can be replicated to all
partitions. This way stored procedures can retrieve and return user-friendly
product information when searching by product code without impacting the
performance of order and inventory updates and searches.
Good and bad shard keys
■ good: user session, shopping order
■ maybe: user_id (if user data isn’t too thick)
■ Better: (user_id, post_id)
■ bad: inventory item, order date
Special cases
Scaling a message queue
Scaling in a data warehouse
■ Data warehouses usually don’t check unique constraints
■ Data is sorted multiple times, according to multiple dimensions
■ Sharding can be done according to a hash of multiple fields
Let’s recap
Summary: design choices
Hash Range
Write heavy/monotonic//time
series
Linear scaling Hotspots
Primary key read Linear scaling Linear scaling
Partial key read Hotspots Linear scaling
Indexed range read Hotspots Linear scaling
Non-indexed read Hotspots Hotspots
Brought to you by
Konstantin Osipov
kostja@scylladb.com
@kostja_osipov

More Related Content

PDF
Avoiding Data Hotspots at Scale
PDF
Seastore: Next Generation Backing Store for Ceph
PDF
Unikraft: Fast, Specialized Unikernels the Easy Way
PDF
How to Meet Your P99 Goal While Overcommitting Another Workload
PPTX
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
PDF
P99CONF — What We Need to Unlearn About Persistent Storage
PDF
Sharding: Past, Present and Future with Krutika Dhananjay
PDF
Build Low-Latency Applications in Rust on ScyllaDB
Avoiding Data Hotspots at Scale
Seastore: Next Generation Backing Store for Ceph
Unikraft: Fast, Specialized Unikernels the Easy Way
How to Meet Your P99 Goal While Overcommitting Another Workload
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
P99CONF — What We Need to Unlearn About Persistent Storage
Sharding: Past, Present and Future with Krutika Dhananjay
Build Low-Latency Applications in Rust on ScyllaDB

What's hot (20)

PDF
Scaling Apache Pulsar to 10 Petabytes/Day
PDF
Cassandra To Infinity And Beyond
PPTX
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
PDF
Update on Crimson - the Seastarized Ceph - Seastar Summit
PDF
State of Gluster Performance
PDF
G1: To Infinity and Beyond
PPTX
Sizing Your Scylla Cluster
PDF
Object Compaction in Cloud for High Yield
PDF
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
PPTX
How to be Successful with Scylla
PDF
Life as a GlusterFS Consultant with Ivan Rossi
PDF
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
PDF
Integration of Glusterfs in to commvault simpana
PDF
Challenges with Gluster and Persistent Memory with Dan Lambright
PDF
RADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu Chai
PDF
High-Performance Networking Using eBPF, XDP, and io_uring
PDF
Keeping Latency Low and Throughput High with Application-level Priority Manag...
PDF
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
PDF
Is It Faster to Go with Redpanda Transactions than Without Them?!
PDF
Whoops! I Rewrote It in Rust
Scaling Apache Pulsar to 10 Petabytes/Day
Cassandra To Infinity And Beyond
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
Update on Crimson - the Seastarized Ceph - Seastar Summit
State of Gluster Performance
G1: To Infinity and Beyond
Sizing Your Scylla Cluster
Object Compaction in Cloud for High Yield
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How to be Successful with Scylla
Life as a GlusterFS Consultant with Ivan Rossi
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
Integration of Glusterfs in to commvault simpana
Challenges with Gluster and Persistent Memory with Dan Lambright
RADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu Chai
High-Performance Networking Using eBPF, XDP, and io_uring
Keeping Latency Low and Throughput High with Application-level Priority Manag...
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Is It Faster to Go with Redpanda Transactions than Without Them?!
Whoops! I Rewrote It in Rust
Ad

Similar to What We Need to Unlearn about Persistent Storage (20)

PDF
Scaling MongoDB with Horizontal and Vertical Sharding
PDF
Scaling-MongoDB-with-Horizontal-and-Vertical-Sharding Mydbops Opensource Data...
PPTX
MongoDB Sharding
PPTX
Hellenic MongoDB user group - Introduction to sharding
PPT
2011 mongo FR - scaling with mongodb
KEY
Scaling MongoDB (Mongo Austin)
PPT
Everything You Need to Know About Sharding
PDF
Шардинг в MongoDB, Henrik Ingo (MongoDB)
KEY
Scaling with MongoDB
PDF
Database Sharding: Complete understanding
PPTX
MongoDB for Time Series Data: Sharding
PDF
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
PPTX
MongoDB for Time Series Data Part 3: Sharding
PPT
MongoDB Sharding Webinar 2014
PPTX
Sharding Overview
PDF
Introduction to Sharding
PPTX
Sharding
PDF
Postgres Vision 2018: Five Sharding Data Models
 
PDF
Understanding and building big data Architectures - NoSQL
PDF
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Scaling MongoDB with Horizontal and Vertical Sharding
Scaling-MongoDB-with-Horizontal-and-Vertical-Sharding Mydbops Opensource Data...
MongoDB Sharding
Hellenic MongoDB user group - Introduction to sharding
2011 mongo FR - scaling with mongodb
Scaling MongoDB (Mongo Austin)
Everything You Need to Know About Sharding
Шардинг в MongoDB, Henrik Ingo (MongoDB)
Scaling with MongoDB
Database Sharding: Complete understanding
MongoDB for Time Series Data: Sharding
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
MongoDB for Time Series Data Part 3: Sharding
MongoDB Sharding Webinar 2014
Sharding Overview
Introduction to Sharding
Sharding
Postgres Vision 2018: Five Sharding Data Models
 
Understanding and building big data Architectures - NoSQL
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Ad

More from ScyllaDB (20)

PDF
Understanding The True Cost of DynamoDB Webinar
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
PDF
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
PDF
New Ways to Reduce Database Costs with ScyllaDB
PDF
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
PDF
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
PDF
Leading a High-Stakes Database Migration
PDF
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
PDF
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
PDF
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
PDF
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
PDF
ScyllaDB: 10 Years and Beyond by Dor Laor
PDF
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
PDF
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
PDF
Vector Search with ScyllaDB by Szymon Wasik
PDF
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
PDF
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
PDF
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
PDF
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
PDF
Lessons Learned from Building a Serverless Notifications System by Srushith R...
Understanding The True Cost of DynamoDB Webinar
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
New Ways to Reduce Database Costs with ScyllaDB
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
Leading a High-Stakes Database Migration
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB: 10 Years and Beyond by Dor Laor
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
Vector Search with ScyllaDB by Szymon Wasik
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
Lessons Learned from Building a Serverless Notifications System by Srushith R...

Recently uploaded (20)

PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
cuic standard and advanced reporting.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Spectroscopy.pptx food analysis technology
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Big Data Technologies - Introduction.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Encapsulation theory and applications.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Electronic commerce courselecture one. Pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
NewMind AI Weekly Chronicles - August'25 Week I
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
cuic standard and advanced reporting.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Review of recent advances in non-invasive hemoglobin estimation
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Spectroscopy.pptx food analysis technology
MYSQL Presentation for SQL database connectivity
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
The AUB Centre for AI in Media Proposal.docx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Big Data Technologies - Introduction.pptx
Unlocking AI with Model Context Protocol (MCP)
Encapsulation theory and applications.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Building Integrated photovoltaic BIPV_UPV.pdf
Electronic commerce courselecture one. Pdf

What We Need to Unlearn about Persistent Storage

  • 1. Brought to you by Avoiding Data Hotspots At Scale Konstantin Osipov Engineering at
  • 2. Konstantin Osipov Director of Engineering ■ Worked on lightweight transactions in Scylla ■ Rarely happy with the status quo (AKA the stubborn one) ■ A very happy father ■ Career and public speaking coach
  • 3. RUM conjecture and scalability
  • 4. What this talk is not ● replication ● Re-sharding and re-balancing data ● distributed queries & jobs will focus on principles data distribution only
  • 6. Define sharding Sharding - horizontal partitioning of data across multiple servers. Can be used to scale capacity and (possibly) throughput of the database. 3 key challenges: ● Choosing a way to split data across nodes ● Re-balancing data and maintaining location information ● Routing queries to the data
  • 7. Hash based sharding Hash ring Hashed keys Consistent hash Ketama hash
  • 8. Sharding: hash + virtual buckets in Couchbase
  • 9. Sharding: chunk splits and migrations in MongoDB
  • 12. Sharding: ranges in CockroachDB
  • 13. mongodb For queries that don’t include the shard key, mongos must query all shards, wait for their response and then return the result to the application. These “scatter/gather” queries can be long running operations. However, range based partitioning can result in an uneven distribution of data, which may negate some of the benefits of sharding. For example, if the shard key is a linearly increasing field, such as time, then all requests for a given time range will map to the same chunk, and thus the same shard. In this situation, a small set of shards may receive the majority of requests and the system would not scale very well.
  • 14. spanner One cause of hotspots is having a column whose value monotonically increases as the first key part, because this results in all inserts occurring at the end of your key space. This pattern is undesirable because Cloud Spanner divides data among servers by key ranges, which means all your inserts will be directed at a single server that will end up doing all the work.
  • 17. Descending order for timestamp-based keys CREATE TABLE UserAccessLog ( UserId INT64 NOT NULL, LastAccess TIMESTAMP NOT NULL, ... ) PRIMARY KEY (UserId, LastAccess DESC);
  • 19. voltdb To further optimize performance, VoltDB allows selected tables to be replicated on all partitions of the cluster. This strategy minimizes cross-partition join operations. For example, a retail merchandising database that uses product codes as the primary key may have one table that simply correlates the product code with the product's category and full name. Since this table is relatively small and does not change frequently (unlike inventory and orders) it can be replicated to all partitions. This way stored procedures can retrieve and return user-friendly product information when searching by product code without impacting the performance of order and inventory updates and searches.
  • 20. Good and bad shard keys ■ good: user session, shopping order ■ maybe: user_id (if user data isn’t too thick) ■ Better: (user_id, post_id) ■ bad: inventory item, order date
  • 23. Scaling in a data warehouse ■ Data warehouses usually don’t check unique constraints ■ Data is sorted multiple times, according to multiple dimensions ■ Sharding can be done according to a hash of multiple fields
  • 25. Summary: design choices Hash Range Write heavy/monotonic//time series Linear scaling Hotspots Primary key read Linear scaling Linear scaling Partial key read Hotspots Linear scaling Indexed range read Hotspots Linear scaling Non-indexed read Hotspots Hotspots
  • 26. Brought to you by Konstantin Osipov kostja@scylladb.com @kostja_osipov