SlideShare a Scribd company logo
안미진
RocksDB Compaction
Embedded Key-Value Store for Flash and RAM
Contents
1. RocksDB Architecture
2. Level Style Compaction
3. Universal Style Compaction
4. RocksDB Compaction
Overview
RocksDB Architecture
Active
Memtable
Read-Only
Memtable
Memory
Log
Log
SSTSSTSST
SSTSSTSST
Persistent Storage
Write Request
Read Request LSM Files
CompactionFlush
Switch Switch
RocksDB Architecture
Active
Memtable
Read-Only
Memtable
Memory
Log
Log
SSTSSTSST
SSTSSTSST
Persistent Storage
Write Request
Read Request LSM Files
CompactionFlush
Switch Switch
RocksDB Architecture
Active
Memtable
Read-Only
Memtable
Memory
Log
Log
SSTSSTSST
SSTSSTSST
Persistent Storage
Write Request
Read Request LSM Files
CompactionFlush
Switch Switch
RocksDB Architecture
Active
Memtable
Read-Only
Memtable
Memory
Log
Log
SSTSSTSST
SSTSSTSST
Persistent Storage
Write Request
LSM Files
CompactionFlush
Switch Switch
Read Request
RocksDB Architecture
Active
Memtable
Read-Only
Memtable
Memory
Log
Log
SSTSSTSST
SSTSSTSST
Persistent Storage
Write Request
LSM Files
CompactionFlush
Switch Switch
Read Request
RocksDB Architecture
Active
Memtable
(4MB)
Immutable
Memtable
Memory
Disk
Write
Level 0
(4 SSTfile)
Level 1
(10MB)
Level 2
(100MB)
. . .
. . . . . .
Info Log
MANIFEST
CURRENT
Compaction
Log
SSTfile
(2MB)
RocksDB Compaction
Multi-threaded compactions
• Background Multi-thread
→ periodically do the “compaction”
→ parallel compactions on different parts of the database
can occur simultaneously
• Merge SSTfiles to a bigger SSTfile
• Remove multiple copies of the same key
– Duplicate or overwritten keys
• Process deletions of keys
• Supports two different styles of compaction
– Tunable compaction to trade-off
Level Style Compaction
• level0_file_num_compaction_trigger
- Number of files to trigger level0 compaction
- Default : 1
Ex) candidate files size < the next file’s size (1% smaller)
→ include next file into this candidate set
• Level0_file_
- The minimum number of files in a single compaction
- Default : 2
• max_merge_width
- The maximum number of files in a single compaction
- Default : UINT_MAX
Compaction options
1. Level Style Compaction
• RocksDB default compaction style
• Stores data in multiple levels in the database
• More recent data → L0
The oldest data → Lmax
• Files in L0
- overlapping keys, sorted by flush time
Files in L1 and higher
- non-overlapping keys, sorted by key
• Each level is 10 times larger than the previous one
Inherited from LevelDB
Level Style Compaction
Compaction process
cache
log
level1
level2
level3
level0
① Pick one file from level N
② Compact it with all its overlapping
files from level N+1
③ Replace them with new files in
level N+1
Level 0 → Level 1 Compaction
• Level 0 → overlapping keys
• Compaction includes all files from L1
• All files from L1 are compacted with L0
• L0 → L1 compaction completion
L1 → L2 compaction start
• Single thread compaction → not good throughput
• Solution : Making the size of L0 similar to size of L1
Tricky Compaction
Level Style Compaction
Level Style Compaction
· Level score =
𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑙𝑒𝑣𝑒𝑙 𝑠𝑖𝑧𝑒
max level size
· max file size
= target_file_size_base * target_file_size_multiplier
(Default=2MB) (Default=1)
· Overlapping range search
: Binary Search
Level Style
Flowchart
• Read : 128KB / Write : 512KB
Level Style Compaction
2. Universal Style Compaction
• For write-heavy workloads
→ Level Style Compaction may be bottlenecked on
disk throughput
• Stores all files in L0
• All files are arranged in time order
• Temporarily increase size amplification by a factor of
two
• Intended to decrease write amplification
• But, increase space amplification
Universal Style Compaction
① Pick up a few files that are chronologically adjacent to one
another
② Merge them
③ Replace them with a new file in level 0
Compaction process
Universal Style Compaction
Universal Style Compaction
Universal Style Compaction
Flowchart
Universal Style Compaction
• Read : 128KB / Write : 512KB
Universal Style Compaction
• size_ratio
- Percentage flexibility while comparing file size
- Default : 1
Ex) candidate set size < size of next file (1% smaller)
→ include next file in candidate set
• min_merge_width
- The minimum number of files in a single compaction
- Default : 2
• max_merge_width
- The maximum number of files in a single compaction
- Default : UINT_MAX
Compaction options
Universal Style Compaction
• max_size_amplification_percent
- The amount of additional storage needed to store a
single byte of data in the database
- Controls the amount of space amplification in the
database
- Does not determine when calls to Put & Delete are
stalled
- Determines when compaction is done
- Default : 200
Compaction options
Universal Style Compaction
• stop_style
- The algorithm used to stop picking files into a single
compaction run
- kCompactionStopStyleSimilarSize
→ Pick files of similar size
- kCompactionStopStyleTotalSize
→ total size of picked files > next files
- Default : kCompactionStopStyleTotalSize
Compaction options

More Related Content

PDF
MyRocks Deep Dive
PPTX
RocksDB detail
PDF
RocksDB Performance and Reliability Practices
PDF
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
PDF
The Parquet Format and Performance Optimization Opportunities
PPTX
HBase and HDFS: Understanding FileSystem Usage in HBase
PDF
Apache Hudi: The Path Forward
PPTX
MyRocks Deep Dive
RocksDB detail
RocksDB Performance and Reliability Practices
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
The Parquet Format and Performance Optimization Opportunities
HBase and HDFS: Understanding FileSystem Usage in HBase
Apache Hudi: The Path Forward

What's hot (20)

PDF
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
PPTX
PDF
Cassandra Introduction & Features
PDF
Seastore: Next Generation Backing Store for Ceph
PDF
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
PPTX
High Performance, High Reliability Data Loading on ClickHouse
PDF
Batch Processing at Scale with Flink & Iceberg
PDF
ClickHouse Introduction, by Alexander Zaitsev, Altinity CTO
PPT
Introduction to redis
PDF
Log Structured Merge Tree
PDF
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
PDF
A Deep Dive into Kafka Controller
PDF
Introduction to Redis
PDF
What is in a Lucene index?
PDF
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
PDF
Power of the Log: LSM & Append Only Data Structures
PDF
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
PDF
Parquet performance tuning: the missing guide
PPTX
Kafka replication apachecon_2013
PDF
Parallel Replication in MySQL and MariaDB
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Cassandra Introduction & Features
Seastore: Next Generation Backing Store for Ceph
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
High Performance, High Reliability Data Loading on ClickHouse
Batch Processing at Scale with Flink & Iceberg
ClickHouse Introduction, by Alexander Zaitsev, Altinity CTO
Introduction to redis
Log Structured Merge Tree
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
A Deep Dive into Kafka Controller
Introduction to Redis
What is in a Lucene index?
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Power of the Log: LSM & Append Only Data Structures
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
Parquet performance tuning: the missing guide
Kafka replication apachecon_2013
Parallel Replication in MySQL and MariaDB
Ad

Similar to RocksDB compaction (20)

PDF
The Power of the Log
PPTX
Geek Sync | Guide to Understanding and Monitoring Tempdb
PPTX
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
PDF
Extlect03
PDF
Scaling ScyllaDB Storage Engine with State-of-Art Compaction
PDF
Why you should care about data layout in the file system with Cheng Lian and ...
PPTX
Some key value stores using log-structure
PDF
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
PPTX
Computer Memory Hierarchy Computer Architecture
PPT
04 cache memory
PPTX
Webinar: Understanding Storage for Performance and Data Safety
PPTX
Storage talk
PDF
Cassandra TK 2014 - Large Nodes
PPT
PPT
cache memory introduction, level, function
PPT
04_Cache_Memory-cust memori memori memori.ppt
PPTX
MongoDB Replication fundamentals - Desert Code Camp - October 2014
PPT
Memory Hierarchy PPT of Computer Organization
PPT
Ct213 memory subsystem
PPTX
MongoDB Replication fundamentals - Desert Code Camp - October 2014
The Power of the Log
Geek Sync | Guide to Understanding and Monitoring Tempdb
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Extlect03
Scaling ScyllaDB Storage Engine with State-of-Art Compaction
Why you should care about data layout in the file system with Cheng Lian and ...
Some key value stores using log-structure
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
Computer Memory Hierarchy Computer Architecture
04 cache memory
Webinar: Understanding Storage for Performance and Data Safety
Storage talk
Cassandra TK 2014 - Large Nodes
cache memory introduction, level, function
04_Cache_Memory-cust memori memori memori.ppt
MongoDB Replication fundamentals - Desert Code Camp - October 2014
Memory Hierarchy PPT of Computer Organization
Ct213 memory subsystem
MongoDB Replication fundamentals - Desert Code Camp - October 2014
Ad

More from MIJIN AN (7)

PDF
InnoDB Flushing and Checkpoints
PDF
Secondary Index Search in InnoDB
PDF
MySQL Space Management
PDF
MySQL Buffer Management
PPTX
Group play service for Tizen
PPTX
MySQL with FaCE
PPTX
MySQL Hash Table
InnoDB Flushing and Checkpoints
Secondary Index Search in InnoDB
MySQL Space Management
MySQL Buffer Management
Group play service for Tizen
MySQL with FaCE
MySQL Hash Table

Recently uploaded (20)

PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
Transform Your Business with a Software ERP System
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
ISO 45001 Occupational Health and Safety Management System
PDF
Understanding Forklifts - TECH EHS Solution
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Materi_Pemrograman_Komputer-Looping.pptx
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPTX
Materi-Enum-and-Record-Data-Type (1).pptx
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
AI in Product Development-omnex systems
PPTX
L1 - Introduction to python Backend.pptx
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPT
JAVA ppt tutorial basics to learn java programming
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Transform Your Business with a Software ERP System
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
ManageIQ - Sprint 268 Review - Slide Deck
How Creative Agencies Leverage Project Management Software.pdf
Operating system designcfffgfgggggggvggggggggg
ISO 45001 Occupational Health and Safety Management System
Understanding Forklifts - TECH EHS Solution
How to Choose the Right IT Partner for Your Business in Malaysia
How to Migrate SBCGlobal Email to Yahoo Easily
Materi_Pemrograman_Komputer-Looping.pptx
Wondershare Filmora 15 Crack With Activation Key [2025
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Materi-Enum-and-Record-Data-Type (1).pptx
Adobe Illustrator 28.6 Crack My Vision of Vector Design
AI in Product Development-omnex systems
L1 - Introduction to python Backend.pptx
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
JAVA ppt tutorial basics to learn java programming
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...

RocksDB compaction

Editor's Notes

  • #9: MANIFEST files will be formatted as a log all changes cause a state change (add or delete) will be appended to the log. A MANIFEST file lists the set of sorted tables that make up each level Informational messages are printed to files named LOG and LOG.old. CURRENT is a latest manifest file name of the text file