SlideShare a Scribd company logo
The Power of the Log
LSM & Append Only Data Structures
Ben Stopford
Confluent Inc
@benstopford
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
Kafka: a Streaming Platform
KAFKA’s Distributed Log
Linear ScansAppend Only
Messaging is a Log-Shaped Problem
Linear ScansAppend Only
Not all problems are Log-Shaped
Many problems benefit from being
addressed in a “log-shaped” way
Supporting Lookups
Lookups in a log
HeadTail
Trees provide Selectivity
bob dave fred hary mike steve vince
Index
But the overarching structure implies Dispersed Writes
bob dave fred hary mike steve vince
Random IO
Log Structured Merge Trees
1996
Used in a range of modern databases
•  BigTable
•  HBase
•  LevelDB
•  SQLite4
•  RocksDB
•  MongoDB
•  WiredTiger
•  Cassandra
•  MySQL
•  InfluxDB ...
If a systems have a natural grain, it
is one formed of sequential
operations which favour locality
Caching & Prefetching
L3 cache
L2 cache
L1 cache
Pre-fetch is your
friend
CPU Caches
Page Cache
Application-level
caching
Disk Controller
Write efficiency comes from
amortising writes into sequential
operations
Taken from ACMQueue: The Pathologies of Big Data
So if we go against the grain of
the system, RAM can actually be
slower than disk
Going against the grain means dispersed
operations that break locality
Poor Locality Good Locality
The beauty of the log lies in its
sequentially
Linear ScansAppend Only
LSM is about re-imagining search
as as a “log-shaped” problem
Arrange writes to be Append Only
Append Only
Journal
(Sequential IO)
Update in Place
Ordered File
(Random IO)
Bob = Carpenter
Bob = Carpenter
Bob = Cabinet Maker
Bob = Cabinet Maker
Avoid dispersed writes
Simple LSM
Writes are collected in memory
Writes
sort
write to disk
older
files
small
index file
RAM
When enough have buffered, sort.
Writes
write to disk
older
files
small
index file
Batched
sorted
RAM
Write the sorted file to disk
Writes
write to disk
older
files
Small, sorted
immutable file
Batched
sorted
Repeat...
Writes
write to disk
Older files New files
Batched
sorted
Batching -> Fast Sequential IO
Writes
write to disk
Older files New files
Batched
Sorted
memtable
That’s the core write path
What about reads?
Search reverse-chronologically
older
files
newer
files
(1) Is “bob” here?
(2) Is “bob” here?
(3) Is “bob” here?
(4) Is “bob” here?
Worst Case
We consult every file
We might have a lot of files!
LSM naturally optimises for writes,
over reads
This is a reasonable tradeoff to make
Optimizing reads is easier than
optimising writes
Optimisation 1
Bound the number of files
Create levels
Level-0
Level-1
Separate thread merges old files, de-
duplicating them.
Level-0
Level-1
Separate thread merges old files, de-
duplicating them.
Level-0
Level-1
Merging process is reminiscent of
merge sort
Take this further with levels
Level-0
Level-1
Level-2
Level-3
Memtable
But single reads still require many
individual lookups:
•  Number of searches:
–  1 per base level
–  1 per level above
Optimisation 2
Caching & Friends
Add Memory
i.e. More Caching / Pre-fetch
Read Ahead & Prefetch
L3 cache
L2 cache
L1 cache
Pre-fetch is your
friend
Page Cache
Disk Controller
If only there was a more efficient
way to avoid searching each file!
Elven Magic?
Bloom Filters
Answers the question:
Do I need to look in this file to
find the value for this key?
Size -> probability of false positive
Key
Hash Function
Bit Set
Bloom Filters
•  Space efficient, probabilistic
data structure
•  As keyspace grows:
–  p(collision) increases
–  Index size is fixed
Many more degrees of freedom for
optimising reads
RAM
Disk
file metadata
& bloom filter
Log Structured Merge Trees
•  A collection of small, immutable indexes
•  All sequential operations, de-duplicate by merging files
•  Index/Bloom in RAM to increase read performance
Subtleties
•  Writes are 1 x IO (blind writes) , rather than 2 x IO’s
(read + modify)
•  Batching writes decreases write amplification. In trees
leaf pages must be updated.
Immutability => Simpler locking semantics
Only
memtable
is mutable
Does it work?
Lots of real world examples
Measureable in the real world
•  Innodb vs MyRocks results, taken from Mark Callaghan’s blog: http://bit.ly/2mhWT7p
•  There are many subtleties. Take all benchmarks with a pinch of salt.
Elements of Beauty
•  Reframing the problem to be Log-Centric. To go with
the grain of the system.
•  Optimise for the harder problem
•  Compartmentalises writes (coordination) to a single
point. Reads -> immutable structures.
Applies in many other areas
•  Sequentiality
–  Databases: write ahead logs
–  Columnar databases: Merge Joins
–  Kafka
•  Immutability
–  Snapshot isolation over explicit locking.
–  Replication (state machines replication)
Log-Centric Approaches Work in
Applications too
Event Sourcing
•  Journaling of
state changes
•  No “update in
place”
Object
Journal
+ 10.36
- 12.12
+ 23.70
+ 13.33
CQRS
Client
Command Query
Write
Optimised
Read
Optimised
log
How Applications or Services
share state
Log-Centric Services
Writer
Read-Replica
Read-Replica
Read-Replica
Writes are localised
to a single service
Log-Centric Services
Writer
Read-Replica
Read-Replica
Read-ReplicaImmutable log
Log-Centric Services
Writer
Read-Replica
Read-Replica
Read-Replica
Many, independent
read replicas
Elements of Beauty
•  Reframing the problem to be Log-Centric. To go with
the grain of the system.
•  Optimise for the harder problem
•  Compartmentalises writes (coordination) to a single
point. Reads -> immutable structures.
Decentralised Design
In both database design as well as in
application development
The Log is the central building block
Pushes us towards the natural grain of
the system
The Log
A single unifying abstraction
References
LSM:
•  benstopford.com/2015/02/14/log-structured-merge-trees/
•  smalldatum.blogspot.co.uk/2017/02/using-modern-sysbench-to-compare.html
•  www.quora.com/How-does-the-Log-Structured-Merge-Tree-work
•  bLSM paper: http://bit.ly/2mT7Vje
Other
•  Pat Helland (Immutability) cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf
•  Peter Ballis (Coordination Avoidance): http://bit.ly/2m7XxnI
•  Jay Kreps: I Heart Logs (O’Reilly 2014)
•  The Data Dichotomy: http://bit.ly/2hk9c2K
Thank you
@benstopford
http://benstopford.com
ben@confluent.io

More Related Content

PDF
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
PDF
Streaming, Database & Distributed Systems Bridging the Divide
PDF
JAX London Slides
PDF
Data Pipelines with Apache Kafka
PDF
Balancing Replication and Partitioning in a Distributed Java Database
PPTX
Advanced databases ben stopford
PDF
Microservices for a Streaming World
PDF
The return of big iron?
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Streaming, Database & Distributed Systems Bridging the Divide
JAX London Slides
Data Pipelines with Apache Kafka
Balancing Replication and Partitioning in a Distributed Java Database
Advanced databases ben stopford
Microservices for a Streaming World
The return of big iron?

What's hot (16)

PDF
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
PPT
Applications of Virtual Machine Monitors for Scalable, Reliable, and Interact...
PPTX
Building Event-Driven Systems with Apache Kafka
PDF
What's new in Confluent 3.2 and Apache Kafka 0.10.2
PPTX
Couchbase presentation
PPTX
Scalability of Amazon Redshift Data Loading and Query Speed
PPTX
Kafka at scale facebook israel
PPTX
Dynamodb Presentation
PPTX
Introduction to couchbase
PPTX
Couchbase 101
PPTX
Voldemort
DOCX
Dynamo db pros and cons
PDF
NoSql presentation
PPTX
Papers we love realtime at facebook
PDF
20150627 bigdatala
PDF
Disaster Recovery Plans for Apache Kafka
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Applications of Virtual Machine Monitors for Scalable, Reliable, and Interact...
Building Event-Driven Systems with Apache Kafka
What's new in Confluent 3.2 and Apache Kafka 0.10.2
Couchbase presentation
Scalability of Amazon Redshift Data Loading and Query Speed
Kafka at scale facebook israel
Dynamodb Presentation
Introduction to couchbase
Couchbase 101
Voldemort
Dynamo db pros and cons
NoSql presentation
Papers we love realtime at facebook
20150627 bigdatala
Disaster Recovery Plans for Apache Kafka
Ad

Similar to The Power of the Log (20)

PDF
Design Patterns for Distributed Non-Relational Databases
PDF
Write intensive workloads and lsm trees
PDF
Design Patterns For Distributed NO-reational databases
PDF
Extlect03
PPTX
Designing data intensive applications
PDF
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
PPTX
The last mile from db to disk
PPTX
Dmytro Okhonko "LogDevice: durable and highly available sequential distribute...
PDF
Data storage systems
PDF
Mark Callaghan, Facebook
PDF
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
PPTX
Intuitions for scaling data centric architectures - Benjamin Stopford
PPTX
Some key value stores using log-structure
PDF
Read- and Write-Optimization in Modern Database Infrastructures by Dzejla Med...
PDF
HBase Sizing Notes
PDF
The InnoDB Storage Engine for MySQL
PDF
Cassandra summit keynote 2014
PDF
Voldemort : Prototype to Production
PDF
Log Structured Merge Tree
Design Patterns for Distributed Non-Relational Databases
Write intensive workloads and lsm trees
Design Patterns For Distributed NO-reational databases
Extlect03
Designing data intensive applications
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
The last mile from db to disk
Dmytro Okhonko "LogDevice: durable and highly available sequential distribute...
Data storage systems
Mark Callaghan, Facebook
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Intuitions for scaling data centric architectures - Benjamin Stopford
Some key value stores using log-structure
Read- and Write-Optimization in Modern Database Infrastructures by Dzejla Med...
HBase Sizing Notes
The InnoDB Storage Engine for MySQL
Cassandra summit keynote 2014
Voldemort : Prototype to Production
Log Structured Merge Tree
Ad

More from Ben Stopford (20)

PPTX
10 Principals for Effective Event-Driven Microservices with Apache Kafka
PPTX
10 Principals for Effective Event Driven Microservices
PDF
The Future of Streaming: Global Apps, Event Stores and Serverless
PDF
A Global Source of Truth for the Microservices Generation
PDF
Building Event Driven Services with Kafka Streams
PDF
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
PDF
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
PDF
Building Event Driven Services with Stateful Streams
PDF
Devoxx London 2017 - Rethinking Services With Stateful Streams
PDF
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
PDF
Event Driven Services Part 1: The Data Dichotomy
PDF
Event Driven Services Part 3: Putting the Micro into Microservices with State...
PDF
Strata Software Architecture NY: The Data Dichotomy
PDF
A little bit of clojure
PPTX
Big iron 2 (published)
PDF
Big Data & the Enterprise
PDF
Where Does Big Data Meet Big Database - QCon 2012
PDF
Coherence Implementation Patterns - Sig Nov 2011
PDF
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
PDF
Test-Oriented Languages: Is it time for a new era?
10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event Driven Microservices
The Future of Streaming: Global Apps, Event Stores and Serverless
A Global Source of Truth for the Microservices Generation
Building Event Driven Services with Kafka Streams
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful Streams
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 3: Putting the Micro into Microservices with State...
Strata Software Architecture NY: The Data Dichotomy
A little bit of clojure
Big iron 2 (published)
Big Data & the Enterprise
Where Does Big Data Meet Big Database - QCon 2012
Coherence Implementation Patterns - Sig Nov 2011
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
Test-Oriented Languages: Is it time for a new era?

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Big Data Technologies - Introduction.pptx
PDF
cuic standard and advanced reporting.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Approach and Philosophy of On baking technology
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
KodekX | Application Modernization Development
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Big Data Technologies - Introduction.pptx
cuic standard and advanced reporting.pdf
Understanding_Digital_Forensics_Presentation.pptx
The AUB Centre for AI in Media Proposal.docx
Chapter 3 Spatial Domain Image Processing.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Approach and Philosophy of On baking technology
Encapsulation_ Review paper, used for researhc scholars
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Review of recent advances in non-invasive hemoglobin estimation
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
KodekX | Application Modernization Development
Reach Out and Touch Someone: Haptics and Empathic Computing
20250228 LYD VKU AI Blended-Learning.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...

The Power of the Log