What We Need to Unlearn about Persistent Storage

Brought to you by
Avoiding Data Hotspots
At Scale
Konstantin Osipov
Engineering at

Konstantin Osipov
Director of Engineering
■ Worked on lightweight transactions in Scylla
■ Rarely happy with the status quo (AKA the stubborn one)
■ A very happy father
■ Career and public speaking coach

RUM conjecture and scalability

What this talk is not
● replication
● Re-sharding and re-balancing data
● distributed queries & jobs
will focus on principles data distribution only

Deﬁne sharding
Sharding - horizontal partitioning of data across multiple servers. Can be used to
scale capacity and (possibly) throughput of the database. 3 key challenges:
● Choosing a way to split data across nodes
● Re-balancing data and maintaining location information
● Routing queries to the data

Hash based sharding
Hash
ring
Hashed keys
Consistent hash Ketama hash

Sharding: hash + virtual buckets in Couchbase

Sharding: chunk splits and migrations in
MongoDB

Sharding: ranges in CockroachDB

mongodb
For queries that don’t include the shard key, mongos must query all shards, wait
for their response and then return the result to the application. These
“scatter/gather” queries can be long running operations.
However, range based partitioning can result in an uneven distribution of data,
which may negate some of the beneﬁts of sharding. For example, if the shard key
is a linearly increasing ﬁeld, such as time, then all requests for a given time range
will map to the same chunk, and thus the same shard. In this situation, a small set
of shards may receive the majority of requests and the system would not scale
very well.

spanner
One cause of hotspots is having a column whose value monotonically increases
as the ﬁrst key part, because this results in all inserts occurring at the end of your
key space. This pattern is undesirable because Cloud Spanner divides data among
servers by key ranges, which means all your inserts will be directed at a single
server that will end up doing all the work.

Bit-reversing the partition key

Descending order for timestamp-based keys
CREATE TABLE UserAccessLog (
UserId INT64 NOT NULL,
LastAccess TIMESTAMP NOT NULL,
...
) PRIMARY KEY (UserId, LastAccess DESC);

Replicating dimension tables everywhere

voltdb
To further optimize performance, VoltDB allows selected tables to be replicated
on all partitions of the cluster. This strategy minimizes cross-partition join
operations. For example, a retail merchandising database that uses product codes
as the primary key may have one table that simply correlates the product code
with the product's category and full name. Since this table is relatively small and
does not change frequently (unlike inventory and orders) it can be replicated to all
partitions. This way stored procedures can retrieve and return user-friendly
product information when searching by product code without impacting the
performance of order and inventory updates and searches.

Good and bad shard keys
■ good: user session, shopping order
■ maybe: user_id (if user data isn’t too thick)
■ Better: (user_id, post_id)
■ bad: inventory item, order date

Scaling in a data warehouse
■ Data warehouses usually don’t check unique constraints
■ Data is sorted multiple times, according to multiple dimensions
■ Sharding can be done according to a hash of multiple ﬁelds

Summary: design choices
Hash Range
Write heavy/monotonic//time
series
Linear scaling Hotspots
Primary key read Linear scaling Linear scaling
Partial key read Hotspots Linear scaling
Indexed range read Hotspots Linear scaling
Non-indexed read Hotspots Hotspots

Brought to you by
Konstantin Osipov
kostja@scylladb.com
@kostja_osipov

What We Need to Unlearn about Persistent Storage

More Related Content

What's hot (20)

Similar to What We Need to Unlearn about Persistent Storage (20)

More from ScyllaDB (20)

Recently uploaded (20)

What We Need to Unlearn about Persistent Storage