SlideShare a Scribd company logo
Performance Tuning RocksDB
for Kafka Streams’ State Store
Dhruba Borthakur (Rockset), Bruno Cadonna (Confluent)
About the Presenters
Dhruba Borthakur
CTO & Co-founder Rockset
rockset.com
2
Bruno Cadonna
Contributor to Apache Kafka &
Software Engineer at Confluent
confluent.io
Agenda
• Kafka Streams and State Stores
• Introduction to RocksDB
• Compaction Styles in RocksDB
• Possible Operational Issues
• Tuning RocksDB
• RocksDB Command Line Utilities
• Takeaways
3
Kafka Streams and State Stores
Kafka Streams
5
● Stateless and stateful processors
● Stateful processors use state stores
Kafka Streams
6
● Stateless and stateful processors
● Stateful processors use state stores
Kafka Streams
7
● Stateless and stateful processors
● Stateful processors use state stores
Kafka Streams
8
● Stateless and stateful processors
● Stateful processors use state stores
Kafka Streams
9
● Stateless and stateful processors
● Stateful processors use state stores
Kafka Streams
10
● Stateless and stateful processors
● Stateful processors use state stores
Kafka Streams
11
● Stateless and stateful processors
● Stateful processors use state stores
● Create one topology per input partition, i.e., task
State Stores in Kafka Streams
12
• Stateful processor may use one or more state
stores
• Each partition has its own state store
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
State Stores in Kafka Streams
13
• Stateful processor may use one or more state
stores
• Each partition has its own state store
• State stores are layered:
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
State Stores in Kafka Streams
14
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
• Stateful processor may use one or more state
stores
• Each partition has its own state store
• State stores are layered:
• collects metrics and de-/serializes records
State Stores in Kafka Streams
15
01
10
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
• Stateful processor may use one or more state
stores
• Each partition has its own state store
• State stores are layered:
• collects metrics and de-/serializes records
State Stores in Kafka Streams
16
01
10
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
• Stateful processor may use one or more state
stores
• Each partition has its own state store
• State stores are layered:
• collects metrics and de-/serializes records
• caches records
01
10
State Stores in Kafka Streams
17
01
10
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
• Stateful processor may use one or more state
stores
• Each partition has its own state store
• State stores are layered:
• collects metrics and de-/serializes records
• caches records
• writes records to changelog
01
10
State Stores in Kafka Streams
18
01
10
01
10
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
• Stateful processor may use one or more state
stores
• Each partition has its own state store
• State stores are layered:
• collects metrics and de-/serializes records
• caches records
• writes records to changelog
01
10
State Stores in Kafka Streams
19
01
10
01
10
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
• Stateful processor may use one or more state
stores
• Each partition has its own state store
• State stores are layered:
• collects metrics and de-/serializes records
• caches records
• writes records to changelog
• writes records to local state store
01
10
State Stores in Kafka Streams
20
01
10
01
10
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
• Stateful processor may use one or more state
stores
• Each partition has its own state store
• State stores are layered:
• collects metrics and de-/serializes records
• caches records
• writes records to changelog
• writes records to local state store
• State stores are restored from changelog
topics
• Restoration is byte-based and by-passes
wrapping layers
RocksDB is the Default State Store
• Kafka Streams needed a write optimized state store
• Kafka Streams 2.6 uses RocksDB 5.18.4
• Kafka Streams provides metrics to monitor RocksDB state stores
• RocksDB can be configured by passing a class that implements interface
RocksDBConfigSetter to configuration rocksdb.config.setter
21
Example: Configuring RocksDB in Kafka Streams
22
public static class MyRocksDBConfig implements RocksDBConfigSetter {
@Override
public void setConfig(final String storeName,
final Options options,
final Map<String, Object> configs) {
// e.g. set compaction style
options.setCompactionStyle(CompactionStyle.LEVEL);
}
@Override
public void close(final String storeName, final Options options) {}
}
Introduction to RocksDB
What is RocksDB?
• Key-value persistent store
• Embedded C++ & Java library
• Server workloads
24
What is it not?
• Not distributed
• No failover
• Not highly-available. If the machine
dies, you lose your data
• Focus on performance
Kafka Streams makes it fault-tolerant
25
RocksDB API
• Keys and values are byte arrays
• Data are stored sorted by key
• Update Operations: Put/Delete/Merge
• Queries: Get/Iterator
26
Log Structured Merge Architecture
27
Periodic
compaction
Read only data
in SSD or disk
Read write data
in RAM
Transaction log
Scan request from
application
Write request
from application
RocksDB Write Path
28
Write request
Read only
MemTable
Log
Log
sst sst sst
sst sst sst
LS
Compaction
Flush
SwitchSwitch
Active
MemTable Log
RocksDB Reads
• Data can be in memory or disk
• Consult multiple files to find the latest
instance of the key
• Use bloom filters to reduce IO
• Every sst file has a bloom filter
• bloom filters are cached in memory
• default config: eliminates 99% of reads
29
RocksDB Read Path
30
Read only
MemTable Log
Log
sst sst sst
LS
Compaction
Flush
Active
MemTable Log
sst sst sst
Memory
Persistent
Storage
Read
request
Get(k)
Blooms
RocksDB Architecture
31
Read only
MemTable
Log
Log
sst sst sst
LS
Compaction
Flush
Active
MemTable Log
sst sst
Memory
Persistent
Storage
sst
Switch Switch
Write
request
Read only
BlockCache
Read
request
RocksDB Open & Pluggable
32
Pluggable
compaction
Pluggable sst
data format on
storage
Pluggable
Memtable
format in RAM
Transaction log
Blooms
Customizable
WAL
Get or scan request
from application
Write request
from application
Compaction Styles in RocksDB
What is Compaction
• Multi-threaded
• Parallel compactions on different parts of the database
• Deletes overwritten keys
• Two types of compactions
• level compactions
• universal compaction
34
Level compaction
• RocksDB default compaction is Level Compaction (for read heavy workloads)
• Stores data in multiple levels
• More recent data stored in L0
• Older data stored in Lmax
• Files in L0
• overlapping keys, sorted by flush time
• Files in L1 to Lmax
• non overlapping keys, sorted by key
• Max space amplification = 10%
https://github.com/facebook/rocksdb/wiki/Leveled-Compaction
35
Universal Compaction
• For write heavy workloads
• needed if Level style compaction is bottlenecked by disk throughout
• Stores all files in L0
• All files are arranged in time order
• Decreases write amplification but increases space amplification
• Pick up files that are chronologically adjacent to one another
• merge them
• replace them with a new file in L0
36
Possible Operational Issues
Operational Issues
• High memory usage
• Application gets slower or even crashes
• Operating system shows high memory usage
• Kafka Streams metrics for monitoring memory
usage of RocksDB (KIP-607, planned for 2.7)
show high values
38
Operational Issues
• High memory usage
• Application gets slower or even crashes
• Operating system shows high memory usage
• Kafka Streams metrics for monitoring memory
usage of RocksDB (KIP-607, planned for 2.7)
show high values
• High disk usage
• Application crashes with I/O errors
• Operating system shows high disk usage
39
Operational Issues
• High disk I/O
• Operating system shows high disk I/O
• Kafka Streams metrics with high values
• memtable-bytes-flushed-[rate | total]
• bytes-[read | written]-compaction-rate
• Kafka Streams metrics with low values
• memtable-hit-ratio
• block-cache-[data | index | filter]-hit-ratio
40
Operational Issues
• High disk I/O
• Operating system shows high disk I/O
• Kafka Streams metrics with high values
• memtable-bytes-flushed-[rate | total]
• bytes-[read | written]-compaction-rate
• Kafka Streams metrics with low values
• memtable-hit-ratio
• block-cache-[data | index | filter]-hit-ratio
• Write stalls
• Processing latency of the application increases
• Kafka Streams client gets kicked out of the group
• Kafka Streams metric write-stall-duration-[avg | total] shows high values
41
Operational Issues
• Too many open files
• Application crashes with I/O errors
• Kafka Streams metric number-open-files shows high values
42
Operational Issues
• Kafka Streams client gets kicked out of the consumer group during restoration
• Before 2.6 Kafka Streams used RocksDB’s bulk loading (Options#prepareForBulkLoad())
feature to restore the state store faster.
• Bulk loading basically consists of:
• disable automatic compaction and
• write all data to level 0
• trigger manual compaction
43
Operational Issues
• Kafka Streams client gets kicked out of the consumer group during restoration
• Before 2.6 Kafka Streams used RocksDB’s bulk loading (Options#prepareForBulkLoad())
feature to restore the state store faster.
• Bulk loading basically consists of:
• disable automatic compaction and
• write all data to level 0
• trigger manual compaction
• Manual compaction is a blocking call that may take longer than max.poll.interval.ms
44
Operational Issues
• Kafka Streams client gets kicked out of the consumer group during restoration
• Before 2.6 Kafka Streams used RocksDB’s bulk loading (Options#prepareForBulkLoad())
feature to restore the state store faster.
• Bulk loading basically consists of:
• disable automatic compaction and
• write all data to level 0
• trigger manual compaction
• Manual compaction is a blocking call that may take longer than max.poll.interval.ms
• Bulk loading is removed in 2.6
• Currently evaluating alternatives to increase the performance of state store restoration by using other
features of RocksDB, e.g., ingesting SST files directly.
45
Tuning RocksDB
Debug Kafka Streams OOM
• Memory consumption
• memtable (for writes)
• memtable size, number of memtables
• block cache (reads)
• configure to share among all the partitions in the kafka store
• Kafka Streams keeps index blocks in the block cache
• rocksdb-java bugs (https://github.com/facebook/rocksdb/issues/6247)
• High disk usage
• Use level compaction instead of universal compaction
• provision more disk space
https://docs.confluent.io/current/streams/developer-guide/memory-mgmt.html
47
Debug writes stalls
• Debug write stalls in RocksDB
• Is disk IO utilization at 100%?
• add more storage spindles
• use universal compaction
• Check number of background compaction threads
• Kafka Streams uses Max(2, number of available processors) by default
• Check memtable configuration
• AdvancedColumnFamilyOptions.max_write_buffer_number
• ColumnFamilyOptions.write_buffer_size
48
Debugging file descriptor issues
• Too many open files
• DBOptions.max_open_files = -1 (default)
• opens all sst files at db open time
• good for performance but can run out of file descriptors
• Increase operating system number of open file descriptors
• Set DBOptions.max_open_files = 10000
• will open a max of 10000 files concurrently
• Decrease number of files by making each file larger
• AdvancedColumnFamilyOptions.target_file_size_base = 128 MB (default is 64 MB)
49
RocksDB Command Line Utilities
Build rocksdb command line utilities
git clone git@github.com:facebook/rocksdb.git
cd rocksdb
make ldb sst_dump
cp ldb /usr/local/bin
cp sst_dump /usr/local/bin
51
Useful RocksDB command line tools: https://github.com/facebook/rocksdb/wiki/Administration-and-Data-
Access-Tool
Build
# change these values accordingly
APP_ID=my-app
STATE_STORE=my-counts
STATE_STORE_DIR=/tmp/kafka-streams
TASKS=$(ls $STATE_STORE_DIR/$APP_ID)
Change These Values
Useful commands
# View all keys
for i in $TASKS; do
ldb --db=$STATE_STORE_DIR/$APP_ID/$i/rocksdb/$STATE_STORE
scan 2>/dev/null;
done
# Show table properties
for i in $TASKS; do
TABLE_PROPERTIES=$(sst_dump --
file=$STATE_STORE_DIR/$APP_ID/$i/rocksdb/$STATE_STORE --
show_properties)
echo -e "Table properties for task:
$in$TABLE_PROPERTIESnn"
done
52
Useful commands- Example output
53
# example output
Table properties for task: 1_9
from [] to []
Process /tmp/kafka-streams/my-app/1_9/rocksdb/my-counts/000006.sst
Sst file format: block-based
Table Properties:
------------------------------
# data blocks: 1
# entries: 2
raw key size: 18
raw average key size: 9.000000
raw value size: 88
raw average value size: 44.000000
data block size: 125
index block size: 35
filter block size: 0
(estimated) table size: 160
Takeaways
Takeaways
• RocksDB is the default state store in Kafka Streams
• Kafka Streams provides functionality to configure and monitor RocksDB
• RocksDB uses a log structured merge (LSM) architecture with different compaction
styles
• You might run into operational issues, but you can solve them by debugging and tuning
RocksDB
• RocksDB offers command line utilities for analysing state stores
55
56
Thank you!
dhruba@rockset.com
bruno@confluent.io
cnfl.io/meetups cnfl.io/slackcnfl.io/blog
Learn how Rockset uses RocksDB
https://rockset.com/blog/how-we-use-rocksdb-at-rockset/

More Related Content

PDF
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
PDF
How Apache Kafka® Works
PDF
When NOT to use Apache Kafka?
PPTX
Introduction to Kafka Cruise Control
PDF
ksqlDB: A Stream-Relational Database System
PPTX
RabbitMQ & Kafka
PDF
KSQL Intro
PDF
Securing Kafka
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
How Apache Kafka® Works
When NOT to use Apache Kafka?
Introduction to Kafka Cruise Control
ksqlDB: A Stream-Relational Database System
RabbitMQ & Kafka
KSQL Intro
Securing Kafka

What's hot (20)

PDF
Producer Performance Tuning for Apache Kafka
PPTX
Deep Dive into Apache Kafka
PPTX
HBase and HDFS: Understanding FileSystem Usage in HBase
PDF
A Deep Dive into Kafka Controller
PDF
Apache Spark on K8S Best Practice and Performance in the Cloud
PPTX
RocksDB compaction
PDF
Performance Tuning RocksDB for Kafka Streams’ State Stores
PDF
Apache Flink internals
PPTX
PPTX
Apache Flink and what it is used for
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
PDF
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
PPTX
RocksDB detail
PDF
Apache Kafka Architecture & Fundamentals Explained
PPTX
Autoscaling Flink with Reactive Mode
PDF
A Deep Dive into Query Execution Engine of Spark SQL
PDF
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
PPTX
Hive: Loading Data
PPTX
Introduction to Apache Kafka
ODP
Stream processing using Kafka
Producer Performance Tuning for Apache Kafka
Deep Dive into Apache Kafka
HBase and HDFS: Understanding FileSystem Usage in HBase
A Deep Dive into Kafka Controller
Apache Spark on K8S Best Practice and Performance in the Cloud
RocksDB compaction
Performance Tuning RocksDB for Kafka Streams’ State Stores
Apache Flink internals
Apache Flink and what it is used for
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
RocksDB detail
Apache Kafka Architecture & Fundamentals Explained
Autoscaling Flink with Reactive Mode
A Deep Dive into Query Execution Engine of Spark SQL
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Hive: Loading Data
Introduction to Apache Kafka
Stream processing using Kafka
Ad

Similar to Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur, Rockset, Bruno Cadonna, Confluent) Kafka Summit 2020 (20)

PPTX
Real time data pipline with kafka streams
PPTX
Stateful streaming and the challenge of state
PPTX
Elasticity vs. State? Exploring Kafka Streams Cassandra State Store
PPTX
Kafka streams decoupling with stores
PDF
Kafka internals
PDF
Lessons Learned Scaling Stateful Kafka Streams Topologies with Ferran Galí i ...
PDF
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
PDF
Deploying Kafka Streams Applications with Docker and Kubernetes
PPTX
Kafka streams fifth elephant 2018
PDF
Streaming Updates through Complex Operations in Kafka Streams at Scale with V...
PDF
Operational Analytics on Event Streams in Kafka
PDF
Richmond kafka streams intro
PDF
What every software engineer should know about streams and tables in kafka ...
PPTX
Rocks db state store in structured streaming
PDF
Replacing RocksDB with ScyllaDB in Kafka Streams by Almog Gavra
PPTX
Service messaging using Kafka
PDF
War Stories: DIY Kafka
PDF
RocksDB storage engine for MySQL and MongoDB
PDF
Improving Streams Scalability with Transactional StateStores (KIP-892)
PDF
War Stories: DIY Kafka
Real time data pipline with kafka streams
Stateful streaming and the challenge of state
Elasticity vs. State? Exploring Kafka Streams Cassandra State Store
Kafka streams decoupling with stores
Kafka internals
Lessons Learned Scaling Stateful Kafka Streams Topologies with Ferran Galí i ...
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
Deploying Kafka Streams Applications with Docker and Kubernetes
Kafka streams fifth elephant 2018
Streaming Updates through Complex Operations in Kafka Streams at Scale with V...
Operational Analytics on Event Streams in Kafka
Richmond kafka streams intro
What every software engineer should know about streams and tables in kafka ...
Rocks db state store in structured streaming
Replacing RocksDB with ScyllaDB in Kafka Streams by Almog Gavra
Service messaging using Kafka
War Stories: DIY Kafka
RocksDB storage engine for MySQL and MongoDB
Improving Streams Scalability with Transactional StateStores (KIP-892)
War Stories: DIY Kafka
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
PDF
Migration, backup and restore made easy using Kannika
PDF
Five Things You Need to Know About Data Streaming in 2025
PDF
Data in Motion Tour Seoul 2024 - Keynote
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
PDF
Unlocking value with event-driven architecture by Confluent
PDF
Il Data Streaming per un’AI real-time di nuova generazione
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
PDF
Building API data products on top of your real-time data infrastructure
PDF
Speed Wins: From Kafka to APIs in Minutes
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
Webinar Think Right - Shift Left - 19-03-2025.pptx
Migration, backup and restore made easy using Kannika
Five Things You Need to Know About Data Streaming in 2025
Data in Motion Tour Seoul 2024 - Keynote
Data in Motion Tour Seoul 2024 - Roadmap Demo
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
Data in Motion Tour 2024 Riyadh, Saudi Arabia
Build a Real-Time Decision Support Application for Financial Market Traders w...
Strumenti e Strategie di Stream Governance con Confluent Platform
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
Building Real-Time Gen AI Applications with SingleStore and Confluent
Unlocking value with event-driven architecture by Confluent
Il Data Streaming per un’AI real-time di nuova generazione
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Break data silos with real-time connectivity using Confluent Cloud Connectors
Building API data products on top of your real-time data infrastructure
Speed Wins: From Kafka to APIs in Minutes
Evolving Data Governance for the Real-time Streaming and AI Era

Recently uploaded (20)

PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Big Data Technologies - Introduction.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Modernizing your data center with Dell and AMD
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Empathic Computing: Creating Shared Understanding
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Cloud computing and distributed systems.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Network Security Unit 5.pdf for BCA BBA.
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
MYSQL Presentation for SQL database connectivity
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Big Data Technologies - Introduction.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Advanced Soft Computing BINUS July 2025.pdf
Modernizing your data center with Dell and AMD
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Empathic Computing: Creating Shared Understanding
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Advanced methodologies resolving dimensionality complications for autism neur...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Unlocking AI with Model Context Protocol (MCP)
Review of recent advances in non-invasive hemoglobin estimation
Cloud computing and distributed systems.
20250228 LYD VKU AI Blended-Learning.pptx
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Network Security Unit 5.pdf for BCA BBA.

Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur, Rockset, Bruno Cadonna, Confluent) Kafka Summit 2020

  • 1. Performance Tuning RocksDB for Kafka Streams’ State Store Dhruba Borthakur (Rockset), Bruno Cadonna (Confluent)
  • 2. About the Presenters Dhruba Borthakur CTO & Co-founder Rockset rockset.com 2 Bruno Cadonna Contributor to Apache Kafka & Software Engineer at Confluent confluent.io
  • 3. Agenda • Kafka Streams and State Stores • Introduction to RocksDB • Compaction Styles in RocksDB • Possible Operational Issues • Tuning RocksDB • RocksDB Command Line Utilities • Takeaways 3
  • 4. Kafka Streams and State Stores
  • 5. Kafka Streams 5 ● Stateless and stateful processors ● Stateful processors use state stores
  • 6. Kafka Streams 6 ● Stateless and stateful processors ● Stateful processors use state stores
  • 7. Kafka Streams 7 ● Stateless and stateful processors ● Stateful processors use state stores
  • 8. Kafka Streams 8 ● Stateless and stateful processors ● Stateful processors use state stores
  • 9. Kafka Streams 9 ● Stateless and stateful processors ● Stateful processors use state stores
  • 10. Kafka Streams 10 ● Stateless and stateful processors ● Stateful processors use state stores
  • 11. Kafka Streams 11 ● Stateless and stateful processors ● Stateful processors use state stores ● Create one topology per input partition, i.e., task
  • 12. State Stores in Kafka Streams 12 • Stateful processor may use one or more state stores • Each partition has its own state store Metrics & De-/Serialization Caching Changelogging Restoration
  • 13. State Stores in Kafka Streams 13 • Stateful processor may use one or more state stores • Each partition has its own state store • State stores are layered: Metrics & De-/Serialization Caching Changelogging Restoration
  • 14. State Stores in Kafka Streams 14 Metrics & De-/Serialization Caching Changelogging Restoration • Stateful processor may use one or more state stores • Each partition has its own state store • State stores are layered: • collects metrics and de-/serializes records
  • 15. State Stores in Kafka Streams 15 01 10 Metrics & De-/Serialization Caching Changelogging Restoration • Stateful processor may use one or more state stores • Each partition has its own state store • State stores are layered: • collects metrics and de-/serializes records
  • 16. State Stores in Kafka Streams 16 01 10 Metrics & De-/Serialization Caching Changelogging Restoration • Stateful processor may use one or more state stores • Each partition has its own state store • State stores are layered: • collects metrics and de-/serializes records • caches records
  • 17. 01 10 State Stores in Kafka Streams 17 01 10 Metrics & De-/Serialization Caching Changelogging Restoration • Stateful processor may use one or more state stores • Each partition has its own state store • State stores are layered: • collects metrics and de-/serializes records • caches records • writes records to changelog
  • 18. 01 10 State Stores in Kafka Streams 18 01 10 01 10 Metrics & De-/Serialization Caching Changelogging Restoration • Stateful processor may use one or more state stores • Each partition has its own state store • State stores are layered: • collects metrics and de-/serializes records • caches records • writes records to changelog
  • 19. 01 10 State Stores in Kafka Streams 19 01 10 01 10 Metrics & De-/Serialization Caching Changelogging Restoration • Stateful processor may use one or more state stores • Each partition has its own state store • State stores are layered: • collects metrics and de-/serializes records • caches records • writes records to changelog • writes records to local state store
  • 20. 01 10 State Stores in Kafka Streams 20 01 10 01 10 Metrics & De-/Serialization Caching Changelogging Restoration • Stateful processor may use one or more state stores • Each partition has its own state store • State stores are layered: • collects metrics and de-/serializes records • caches records • writes records to changelog • writes records to local state store • State stores are restored from changelog topics • Restoration is byte-based and by-passes wrapping layers
  • 21. RocksDB is the Default State Store • Kafka Streams needed a write optimized state store • Kafka Streams 2.6 uses RocksDB 5.18.4 • Kafka Streams provides metrics to monitor RocksDB state stores • RocksDB can be configured by passing a class that implements interface RocksDBConfigSetter to configuration rocksdb.config.setter 21
  • 22. Example: Configuring RocksDB in Kafka Streams 22 public static class MyRocksDBConfig implements RocksDBConfigSetter { @Override public void setConfig(final String storeName, final Options options, final Map<String, Object> configs) { // e.g. set compaction style options.setCompactionStyle(CompactionStyle.LEVEL); } @Override public void close(final String storeName, final Options options) {} }
  • 24. What is RocksDB? • Key-value persistent store • Embedded C++ & Java library • Server workloads 24
  • 25. What is it not? • Not distributed • No failover • Not highly-available. If the machine dies, you lose your data • Focus on performance Kafka Streams makes it fault-tolerant 25
  • 26. RocksDB API • Keys and values are byte arrays • Data are stored sorted by key • Update Operations: Put/Delete/Merge • Queries: Get/Iterator 26
  • 27. Log Structured Merge Architecture 27 Periodic compaction Read only data in SSD or disk Read write data in RAM Transaction log Scan request from application Write request from application
  • 28. RocksDB Write Path 28 Write request Read only MemTable Log Log sst sst sst sst sst sst LS Compaction Flush SwitchSwitch Active MemTable Log
  • 29. RocksDB Reads • Data can be in memory or disk • Consult multiple files to find the latest instance of the key • Use bloom filters to reduce IO • Every sst file has a bloom filter • bloom filters are cached in memory • default config: eliminates 99% of reads 29
  • 30. RocksDB Read Path 30 Read only MemTable Log Log sst sst sst LS Compaction Flush Active MemTable Log sst sst sst Memory Persistent Storage Read request Get(k) Blooms
  • 31. RocksDB Architecture 31 Read only MemTable Log Log sst sst sst LS Compaction Flush Active MemTable Log sst sst Memory Persistent Storage sst Switch Switch Write request Read only BlockCache Read request
  • 32. RocksDB Open & Pluggable 32 Pluggable compaction Pluggable sst data format on storage Pluggable Memtable format in RAM Transaction log Blooms Customizable WAL Get or scan request from application Write request from application
  • 34. What is Compaction • Multi-threaded • Parallel compactions on different parts of the database • Deletes overwritten keys • Two types of compactions • level compactions • universal compaction 34
  • 35. Level compaction • RocksDB default compaction is Level Compaction (for read heavy workloads) • Stores data in multiple levels • More recent data stored in L0 • Older data stored in Lmax • Files in L0 • overlapping keys, sorted by flush time • Files in L1 to Lmax • non overlapping keys, sorted by key • Max space amplification = 10% https://github.com/facebook/rocksdb/wiki/Leveled-Compaction 35
  • 36. Universal Compaction • For write heavy workloads • needed if Level style compaction is bottlenecked by disk throughout • Stores all files in L0 • All files are arranged in time order • Decreases write amplification but increases space amplification • Pick up files that are chronologically adjacent to one another • merge them • replace them with a new file in L0 36
  • 38. Operational Issues • High memory usage • Application gets slower or even crashes • Operating system shows high memory usage • Kafka Streams metrics for monitoring memory usage of RocksDB (KIP-607, planned for 2.7) show high values 38
  • 39. Operational Issues • High memory usage • Application gets slower or even crashes • Operating system shows high memory usage • Kafka Streams metrics for monitoring memory usage of RocksDB (KIP-607, planned for 2.7) show high values • High disk usage • Application crashes with I/O errors • Operating system shows high disk usage 39
  • 40. Operational Issues • High disk I/O • Operating system shows high disk I/O • Kafka Streams metrics with high values • memtable-bytes-flushed-[rate | total] • bytes-[read | written]-compaction-rate • Kafka Streams metrics with low values • memtable-hit-ratio • block-cache-[data | index | filter]-hit-ratio 40
  • 41. Operational Issues • High disk I/O • Operating system shows high disk I/O • Kafka Streams metrics with high values • memtable-bytes-flushed-[rate | total] • bytes-[read | written]-compaction-rate • Kafka Streams metrics with low values • memtable-hit-ratio • block-cache-[data | index | filter]-hit-ratio • Write stalls • Processing latency of the application increases • Kafka Streams client gets kicked out of the group • Kafka Streams metric write-stall-duration-[avg | total] shows high values 41
  • 42. Operational Issues • Too many open files • Application crashes with I/O errors • Kafka Streams metric number-open-files shows high values 42
  • 43. Operational Issues • Kafka Streams client gets kicked out of the consumer group during restoration • Before 2.6 Kafka Streams used RocksDB’s bulk loading (Options#prepareForBulkLoad()) feature to restore the state store faster. • Bulk loading basically consists of: • disable automatic compaction and • write all data to level 0 • trigger manual compaction 43
  • 44. Operational Issues • Kafka Streams client gets kicked out of the consumer group during restoration • Before 2.6 Kafka Streams used RocksDB’s bulk loading (Options#prepareForBulkLoad()) feature to restore the state store faster. • Bulk loading basically consists of: • disable automatic compaction and • write all data to level 0 • trigger manual compaction • Manual compaction is a blocking call that may take longer than max.poll.interval.ms 44
  • 45. Operational Issues • Kafka Streams client gets kicked out of the consumer group during restoration • Before 2.6 Kafka Streams used RocksDB’s bulk loading (Options#prepareForBulkLoad()) feature to restore the state store faster. • Bulk loading basically consists of: • disable automatic compaction and • write all data to level 0 • trigger manual compaction • Manual compaction is a blocking call that may take longer than max.poll.interval.ms • Bulk loading is removed in 2.6 • Currently evaluating alternatives to increase the performance of state store restoration by using other features of RocksDB, e.g., ingesting SST files directly. 45
  • 47. Debug Kafka Streams OOM • Memory consumption • memtable (for writes) • memtable size, number of memtables • block cache (reads) • configure to share among all the partitions in the kafka store • Kafka Streams keeps index blocks in the block cache • rocksdb-java bugs (https://github.com/facebook/rocksdb/issues/6247) • High disk usage • Use level compaction instead of universal compaction • provision more disk space https://docs.confluent.io/current/streams/developer-guide/memory-mgmt.html 47
  • 48. Debug writes stalls • Debug write stalls in RocksDB • Is disk IO utilization at 100%? • add more storage spindles • use universal compaction • Check number of background compaction threads • Kafka Streams uses Max(2, number of available processors) by default • Check memtable configuration • AdvancedColumnFamilyOptions.max_write_buffer_number • ColumnFamilyOptions.write_buffer_size 48
  • 49. Debugging file descriptor issues • Too many open files • DBOptions.max_open_files = -1 (default) • opens all sst files at db open time • good for performance but can run out of file descriptors • Increase operating system number of open file descriptors • Set DBOptions.max_open_files = 10000 • will open a max of 10000 files concurrently • Decrease number of files by making each file larger • AdvancedColumnFamilyOptions.target_file_size_base = 128 MB (default is 64 MB) 49
  • 50. RocksDB Command Line Utilities
  • 51. Build rocksdb command line utilities git clone git@github.com:facebook/rocksdb.git cd rocksdb make ldb sst_dump cp ldb /usr/local/bin cp sst_dump /usr/local/bin 51 Useful RocksDB command line tools: https://github.com/facebook/rocksdb/wiki/Administration-and-Data- Access-Tool Build # change these values accordingly APP_ID=my-app STATE_STORE=my-counts STATE_STORE_DIR=/tmp/kafka-streams TASKS=$(ls $STATE_STORE_DIR/$APP_ID) Change These Values
  • 52. Useful commands # View all keys for i in $TASKS; do ldb --db=$STATE_STORE_DIR/$APP_ID/$i/rocksdb/$STATE_STORE scan 2>/dev/null; done # Show table properties for i in $TASKS; do TABLE_PROPERTIES=$(sst_dump -- file=$STATE_STORE_DIR/$APP_ID/$i/rocksdb/$STATE_STORE -- show_properties) echo -e "Table properties for task: $in$TABLE_PROPERTIESnn" done 52
  • 53. Useful commands- Example output 53 # example output Table properties for task: 1_9 from [] to [] Process /tmp/kafka-streams/my-app/1_9/rocksdb/my-counts/000006.sst Sst file format: block-based Table Properties: ------------------------------ # data blocks: 1 # entries: 2 raw key size: 18 raw average key size: 9.000000 raw value size: 88 raw average value size: 44.000000 data block size: 125 index block size: 35 filter block size: 0 (estimated) table size: 160
  • 55. Takeaways • RocksDB is the default state store in Kafka Streams • Kafka Streams provides functionality to configure and monitor RocksDB • RocksDB uses a log structured merge (LSM) architecture with different compaction styles • You might run into operational issues, but you can solve them by debugging and tuning RocksDB • RocksDB offers command line utilities for analysing state stores 55
  • 56. 56 Thank you! dhruba@rockset.com bruno@confluent.io cnfl.io/meetups cnfl.io/slackcnfl.io/blog Learn how Rockset uses RocksDB https://rockset.com/blog/how-we-use-rocksdb-at-rockset/