SlideShare a Scribd company logo
HOSTED BY
Mitigating the Impact of State Management
in Cloud Stream Processing Systems
Yingjun Wu
Founder at RisingWave Labs
Yingjun Wu (he/him/his)
Founder at RisingWave Labs
■ Chief Everything Officer
■ Ex-AWS Redshift
■ Ex-IBM Research Almaden
■ PhD in database systems and stream processing
What is RisingWave?
■ A distributed SQL streaming database
■ Open sourced in April 2022 under Apache 2.0 License
>5KGitHub stars >130GitHub contributors
>1KSlack members >100KK8s deployments
Background
Background
Background
Background
■ Stream processing systems continuously process large volume of data
Background
■ Stream processing systems continuously process large volume of data
● Need to maintain internal states for stateful operators such as joins and aggregations
Background
■ Consider joining two data streams
● Impression stream
● Click stream
State Management
■ Consider joining two data streams
● Impression stream
● Click stream
State Management
■ Consider joining two data streams
● Impression stream
● Click stream
State Management
State Management: Existing Solutions
■ MapReduce style
■ Compute-storage coupled
■ MapReduce style
■ Compute-storage coupled
State Management: Existing Solutions
State Management: Existing Solutions
■ MapReduce style
■ Compute-storage coupled
State Management: Existing Solutions
■ MapReduce style
■ Compute-storage coupled
State Management: the Cloud Era
■ Cloud-native style
■ Compute-storage decoupled
State Management: the Cloud Era
■ Cloud-native style
■ Compute-storage decoupled
+
■ Consider joining two data streams
● Impression stream
● Click stream
State Management in the Cloud
■ Consider joining two data streams
● Impression stream
● Click stream
State Management in the Cloud
■ Consider joining two data streams
● Impression stream
● Click stream
State Management in the Cloud
■ Consider joining two data streams
● Impression stream
● Click stream
State Management in the Cloud
How to mitigate the issue?
Tiered Storage
■ EC2 local storage: the “cloud cache”
● Super fast!
● Data will get lost if machine crashes
■ EBS: the “cloud disk”
● Fast
● 99.999% durability (5 nines)
■ S3: the persistent storage
● Slow
● 99.999999999% durability (11 nines)
Tiered Storage
■ EC2 local storage: the “cloud cache”
● Super fast!
● Data will get lost if machine crashes
■ EBS: the “cloud disk”
● Fast
● 99.999% durability (5 nines)
■ S3: the persistent storage
● Slow
● 99.999999999% durability (11 nines)
Tiered Storage
■ Should we really leverage EBS?
Tiered Storage
■ Should we really leverage EBS?
● No, at least for now…
Tiered Storage
■ Should we really leverage EBS?
● No, at least for now…
■ EC2 local storage vs EBS
● EBS is more expensive than EC2 local storage
● EBS is slower than EC2 local storage
● EBS is persistent while EC2 local storage is not
Tiered Storage
■ How to manage data at different layers?
Log Structured Merge Tree
■ Use log structured merge tree!
● Recently accessed data will be cached in local machine
● Upper level runs will be periodically compacted to lower levels
Compaction
■ Compactions in LSM trees can cause huge latency spikes!
Compaction
■ Compactions in LSM trees can cause huge latency spikes!
● Move the compaction to remote machines
Compaction
■ Compactions in LSM trees can cause huge latency spikes!
● Move the compaction to remote machines
Compaction
■ Compactions in LSM trees can cause huge latency spikes!
● Move the compaction to remote machines
Build SST
Upload SST
Compaction
■ Compactions in LSM trees can cause huge latency spikes!
● Move the compaction to remote machines
Build SST
Upload SST
Compaction
■ Accessing S3 is always expensive!
Compaction
■ Accessing S3 is always expensive!
Compaction
■ Accessing S3 is always expensive!
S3 buckets
Compaction
■ Accessing S3 is always expensive!
● Each bucket has performance limit!
S3 buckets
Compaction
■ Accessing S3 is always expensive!
● Each bucket has performance limit!
● Using too many buckets can cause fragmentation issues!
■ We set #buckets to a magic number and scale based on workloads.
S3 buckets
Conclusion
■ State management is a challenging problem in stream processing systems.
■ Decoupled compute-storage architecture helps achieve infinite and
independent scalability.
■ Tiered storage helps optimize performance.
■ Using remote compaction and streaming compaction to optimize network.
■ Set number of buckets wisely to reduce S3 access bottleneck.
Yingjun Wu
@YingjunWu
risingwave.com/github
risingwave.com/slack
Thank you! Let’s connect.

More Related Content

PDF
Rethinking State Management in Cloud-Native Streaming Systems With Yingjun Wu...
PDF
Rethinking State Management in Cloud-Native Streaming Systems
PDF
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
PPTX
Shikha fdp 62_14july2017
PDF
Batch Processing at Scale with Flink & Iceberg
PDF
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
PPTX
Dancing elephants - efficiently working with object stores from Apache Spark ...
PDF
Piano Media - approach to data gathering and processing
Rethinking State Management in Cloud-Native Streaming Systems With Yingjun Wu...
Rethinking State Management in Cloud-Native Streaming Systems
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Shikha fdp 62_14july2017
Batch Processing at Scale with Flink & Iceberg
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
Piano Media - approach to data gathering and processing

Similar to Mitigating the Impact of State Management in Cloud Stream Processing Systems (20)

PPTX
Cloud Architecture & Distributed Systems Trivia
PDF
Big Data - in the cloud or rather on-premises?
PDF
Big data from the trenches
PPTX
Using Time Window Compaction Strategy For Time Series Workloads
PDF
The Power of Distributed Snapshots in Apache Flink
PPTX
Trivento summercamp masterclass 9/9/2016
PDF
Self-adaptive container monitoring with performance-aware Load-Shedding policies
PDF
Elastic Data Analytics Platform @Datadog
PDF
Events and metrics the Lifeblood of Webops
PDF
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
PDF
Self-adaptive container monitoring with performance-aware Load-Shedding policies
PDF
Builders' Day - Building Data Lakes for Analytics On AWS LC
PDF
Data Science in the Cloud @StitchFix
PDF
Get More from your Data: Accelerate Time-to-Value and Reduce TCO with Conflue...
PPTX
Stream Analytics in the Enterprise
PDF
[ODSC EUROPE 2022] Eagleeye - Data Pipeline for Anomaly Detection in Cyber Se...
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
PDF
Applications in the Cloud
PDF
Data Streaming For Big Data
PPTX
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Cloud Architecture & Distributed Systems Trivia
Big Data - in the cloud or rather on-premises?
Big data from the trenches
Using Time Window Compaction Strategy For Time Series Workloads
The Power of Distributed Snapshots in Apache Flink
Trivento summercamp masterclass 9/9/2016
Self-adaptive container monitoring with performance-aware Load-Shedding policies
Elastic Data Analytics Platform @Datadog
Events and metrics the Lifeblood of Webops
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Self-adaptive container monitoring with performance-aware Load-Shedding policies
Builders' Day - Building Data Lakes for Analytics On AWS LC
Data Science in the Cloud @StitchFix
Get More from your Data: Accelerate Time-to-Value and Reduce TCO with Conflue...
Stream Analytics in the Enterprise
[ODSC EUROPE 2022] Eagleeye - Data Pipeline for Anomaly Detection in Cyber Se...
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Applications in the Cloud
Data Streaming For Big Data
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Ad

More from ScyllaDB (20)

PDF
Understanding The True Cost of DynamoDB Webinar
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
PDF
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
PDF
New Ways to Reduce Database Costs with ScyllaDB
PDF
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
PDF
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
PDF
Leading a High-Stakes Database Migration
PDF
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
PDF
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
PDF
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
PDF
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
PDF
ScyllaDB: 10 Years and Beyond by Dor Laor
PDF
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
PDF
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
PDF
Vector Search with ScyllaDB by Szymon Wasik
PDF
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
PDF
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
PDF
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
PDF
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
PDF
Lessons Learned from Building a Serverless Notifications System by Srushith R...
Understanding The True Cost of DynamoDB Webinar
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
New Ways to Reduce Database Costs with ScyllaDB
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
Leading a High-Stakes Database Migration
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB: 10 Years and Beyond by Dor Laor
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
Vector Search with ScyllaDB by Szymon Wasik
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
Lessons Learned from Building a Serverless Notifications System by Srushith R...
Ad

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Spectroscopy.pptx food analysis technology
PPTX
MYSQL Presentation for SQL database connectivity
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Machine Learning_overview_presentation.pptx
PDF
Encapsulation theory and applications.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Empathic Computing: Creating Shared Understanding
Dropbox Q2 2025 Financial Results & Investor Presentation
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Building Integrated photovoltaic BIPV_UPV.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Spectroscopy.pptx food analysis technology
MYSQL Presentation for SQL database connectivity
NewMind AI Weekly Chronicles - August'25-Week II
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Review of recent advances in non-invasive hemoglobin estimation
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Per capita expenditure prediction using model stacking based on satellite ima...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Spectral efficient network and resource selection model in 5G networks
Machine Learning_overview_presentation.pptx
Encapsulation theory and applications.pdf
Big Data Technologies - Introduction.pptx
Empathic Computing: Creating Shared Understanding

Mitigating the Impact of State Management in Cloud Stream Processing Systems