SlideShare a Scribd company logo
Apple logo is a trademark of Apple Inc.
Kristine Gu
o

Liang-Chi Hsie
h

THIS IS NOT A CONTRIBUTION
Structured Streaming Use-cases at
Apple
Liang-Chi Hsieh
Apache Spark Committe
r

Software Engineer @ Appl
e

https://github.com/viirya
https://www.linkedin.com/in/liang-chi-hsieh-a7904568/
Who am I
Kristine Guo
Software Engineer @ Appl
e

Focus on cloud platform technologie
s

Currently work on developing high scale backend
system
s

https://www.linkedin.com/in/kristineguo/
Who am I
Agenda
Revive Previous Structured Streaming Effort
New Enhancement to Structured Streaming
Use-case at Appl
e
Features in Structured Streaming that matter to us
New built-in StateStor
e

Session Windo
w

Stateful task scheduling enhancemen
t

Checkpoint enhancemen
t
Revive Previous Structured
Streaming Efforts
StateStore: Current Status
What is StateStore
?

A component for state management for stateful operators such streaming aggregates,
joins, etc
.

Stateful
operators
Checkpoint/Restore
Project
Get/Put key/value pairs
StateStore FileSystem
Built-in StateStore
HDFSBackedStateStor
e

• Store states in an in-memory ma
p

• Checkpoint to HDFS-compatible file syste
Disadvantages
?

• Limitation by executor memory and an issue for large state use-case
• Impact other memory usage on the executors
We need new built-in StateStore
Reviving RocksDB StateStore as a built-in StateStore in Structured Streamin
Why
?

• More and more streaming applications requiring large state
• Widely used in the industr
y

• SPARK-34198: Add RocksDB StateStore as external modul
• Received all positive responses from the communty
Visit Current RocksDB StateStore
OSS implementation
s

• https://github.com/chermenin/spark-states
• SPARK-28120: RocksDB state storag
e

• https://github.com/qubole/spark-state-store
OSS in the futur
e

• RocksDB StateStore from Databricks
RocksDB StateStore Benchmark
put/get significant key/value pairs
Best Time(ms) Avg Time(ms) Relative
Chermenin 32725 34507 1.0X
Qubole 25493 25636 1.3X
Databricks TBD TBD TBD
Window Operations
Time
12:01
12:02
10 min window
12:12
12:15
…
Time
event-time window
12:01
12:02
12:20
12:25
…
session gap
event-time window
session window
Reviving session window as a built-in window feature
Session Window
• SPARK-10816: EventTime based sessionizatio
• Keep inactive in last few year
s

• Available in other streaming engines but missing at Structured Streaming
Session Window Internals
Session manipulatio
n

• Session initialization, restoring, merging, savin
Internal StateStore forma
t

• Efficiently retrieve all session states for a specific session ke
• Partially update the start time and duration of the affected windows
Internal StateStore format
• Simple session window list as the valu
e

• Double list approac
h

• Start times of session windows as key
* Special thanks to Yuanjian Li for the state store format design doc
Session Window List Approach
• Easy to implemen
t

• Memory issue if too many sessions per valu
• Not support partial update
Session key A list of row
Row(a1, a2, a3…, session_window(start_time, end_time))
Row structure
Single state store
Double List Approach
• Order of session windows is kept, efficient to travers
• No complex structur
e

• Harder to maintain
Session key Row(…, session_window(s, e))
First start time key, start time1
key, start time2
key, start time3
None, start time 2
start time 1, start time 3
start time 2, None
key, start time1
key, start time2
key, start time3
Row(…, session_window(s, e))
Row(…, session_window(s, e))
Start times as Key Approach
• Keep the order in the list of start time
s

• Store a list of session start times
Row(…, session_window(s, e))
key 1
key 2
key 3
start time 1, start time 2… key1, start time1
Row(…, session_window(s, e))
Row(…, session_window(s, e))
start time 1, start time 2…
start time 1, start time 2…
key1, start time2
key2, start time1
Agenda
Revive Previous Structured Streaming Effort
New Enhancement to Structured Streaming
Use-case at Appl
e
Stateful Task Scheduling Enhancement
• Spark task scheduling is not designed for stateful task
• State store location is assigned arbitraril
- Change of state store location causes frequent reloading from remote F
- As obstacle of future checkpoint enhancements in our future works
Stateful Task Scheduling Enhancement
• Leveraging existing data locality preferences as a simple workaroun
- SPARK-33814: Provide preferred locations for stateful operation
• Customizing Spark task scheduling behavio
- SPARK-35022: Task Scheduling Plugin in Spark
Possible approaches: Ongoing works
Spark Task Scheduler Task Scheduling Plugin
New resource offers
Candidate tasks for scheduling
Scheduling preferences
How Task Scheduling Plugin helps
• Try best to distribute stateful tasks across available executor
• Keep stateful task location stable across batches
Task Scheduling Plugin
Batch N Batch N + 1
Exec 1
Exec 2
Exec 3
Exec 4
State1
State2
State3
State4
Exec 1
State1
How Task Scheduling Plugin helps
• Try best to distribute stateful tasks across available executor
• Keep stateful task location stable across batches
Task Scheduling Plugin
Batch N Batch N + 1
Exec 1
Exec 2
Exec 3
Exec 4
State1
State2
State3
State4
Exec 1
State1
Exec 3
State3
How Task Scheduling Plugin helps
• Try best to distribute stateful tasks across available executor
• Keep stateful task location stable across batches
Task Scheduling Plugin
Batch N Batch N + 1
Exec 1
Exec 2
Exec 3
Exec 4
State1
State2
State3
State4
Exec 1
State1
Exec 3
State3
Exec 4
State4
Agenda
Revive Previous Structured Streaming Effort
New Enhancement to Structured Streaming
Use-case at Appl
e
Use Case
• Two parallel data streams
• Each stream performs aggregation over the same data source
• Stream aggregation operates on dynamically-sized app-defined batches
• Stream-stream join between the two data streams
- Must account for potential lag between streams
Use Case
Performance Requirements
• Throughput: PBs/day
• RPS: High, O(10k)
• Data size: Varying (1Kb to 1MB)
• State store: Stream-stream join exerts high memory pressure
Solutions
• Accounting for potential lag: watermarking
• Dynamic batch aggregation: session windows
• Stream-stream join pressure: RocksDB-based State Store
Thank you!
• Your feedback is important to u
s

• Don’t forget to rate and review the sessions
TM and © 2021 Apple Inc. All rights reserved.

More Related Content

PDF
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
PDF
How We Optimize Spark SQL Jobs With parallel and sync IO
PDF
Creating Reusable Geospatial Pipelines
PDF
Optimizing Apache Spark UDFs
PDF
Building a SIMD Supported Vectorized Native Engine for Spark SQL
PDF
Data Security at Scale through Spark and Parquet Encryption
PDF
Operational Tips For Deploying Apache Spark
PDF
Portable UDFs: Write Once, Run Anywhere
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
How We Optimize Spark SQL Jobs With parallel and sync IO
Creating Reusable Geospatial Pipelines
Optimizing Apache Spark UDFs
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Data Security at Scale through Spark and Parquet Encryption
Operational Tips For Deploying Apache Spark
Portable UDFs: Write Once, Run Anywhere

What's hot (20)

PDF
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
PDF
Writing Continuous Applications with Structured Streaming PySpark API
PDF
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
PPTX
Robust and Scalable ETL over Cloud Storage with Apache Spark
PDF
Hyperspace: An Indexing Subsystem for Apache Spark
PDF
Building a High-Performance Database with Scala, Akka, and Spark
PDF
Operational Tips for Deploying Spark by Miklos Christine
PDF
Apache Spark 3.0: Overview of What’s New and Why Care
PDF
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
PDF
Designing Structured Streaming Pipelines—How to Architect Things Right
PDF
Fugue: Unifying Spark and Non-Spark Ecosystems for Big Data Analytics
PDF
Extending Spark With Java Agent (handout)
PDF
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
PDF
Standalone Spark Deployment for Stability and Performance
PPTX
Apache Spark and Online Analytics
PDF
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
PDF
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
PDF
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
PDF
Degrading Performance? You Might be Suffering From the Small Files Syndrome
PDF
Spark Summit EU talk by Berni Schiefer
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Writing Continuous Applications with Structured Streaming PySpark API
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
Robust and Scalable ETL over Cloud Storage with Apache Spark
Hyperspace: An Indexing Subsystem for Apache Spark
Building a High-Performance Database with Scala, Akka, and Spark
Operational Tips for Deploying Spark by Miklos Christine
Apache Spark 3.0: Overview of What’s New and Why Care
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Designing Structured Streaming Pipelines—How to Architect Things Right
Fugue: Unifying Spark and Non-Spark Ecosystems for Big Data Analytics
Extending Spark With Java Agent (handout)
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Standalone Spark Deployment for Stability and Performance
Apache Spark and Online Analytics
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Degrading Performance? You Might be Suffering From the Small Files Syndrome
Spark Summit EU talk by Berni Schiefer
Ad

Similar to Structured Streaming Use-Cases at Apple (20)

PPT
U4-p2 Run TIme Environment SOurce language.ppt
PPT
U4-p2 Run Time Environment Source language.ppt
PPT
Business workflow
PPTX
Spring batch
PPT
Datastage Introduction To Data Warehousing
PPT
Staged Patching Approach in Oracle E-Business Suite
PPTX
Spring batch introduction
PPTX
Clontab webpage
PPTX
reusable Session-27_Re-Usable Tasks.pptx
PPTX
Datastage free tutorial
PDF
Task-oriented Conversational semantic parsing
PDF
Workflow as code with Azure Durable Functions
PDF
Spring batch overivew
PPTX
Oracle EBS Production Support - Recommendations
PDF
ETL with Clustered Columnstore - PASS Summit 2014
PPTX
Automate Evolve Training: Excel Workflow for Automatic Routing
PPT
R12 d49656 gc10-apps dba 07
PDF
Using extended events for troubleshooting sql server
DOCX
Varun v resume_tc
PPTX
Alternate for scheduled apex using flow builder
U4-p2 Run TIme Environment SOurce language.ppt
U4-p2 Run Time Environment Source language.ppt
Business workflow
Spring batch
Datastage Introduction To Data Warehousing
Staged Patching Approach in Oracle E-Business Suite
Spring batch introduction
Clontab webpage
reusable Session-27_Re-Usable Tasks.pptx
Datastage free tutorial
Task-oriented Conversational semantic parsing
Workflow as code with Azure Durable Functions
Spring batch overivew
Oracle EBS Production Support - Recommendations
ETL with Clustered Columnstore - PASS Summit 2014
Automate Evolve Training: Excel Workflow for Automatic Routing
R12 d49656 gc10-apps dba 07
Using extended events for troubleshooting sql server
Varun v resume_tc
Alternate for scheduled apex using flow builder
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake

Recently uploaded (20)

PPTX
Introduction to machine learning and Linear Models
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Qualitative Qantitative and Mixed Methods.pptx
Introduction to machine learning and Linear Models
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
oil_refinery_comprehensive_20250804084928 (1).pptx
Introduction-to-Cloud-ComputingFinal.pptx
Database Infoormation System (DBIS).pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Miokarditis (Inflamasi pada Otot Jantung)
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
IB Computer Science - Internal Assessment.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Clinical guidelines as a resource for EBP(1).pdf
ISS -ESG Data flows What is ESG and HowHow
Fluorescence-microscope_Botany_detailed content
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Qualitative Qantitative and Mixed Methods.pptx

Structured Streaming Use-Cases at Apple

  • 1. Apple logo is a trademark of Apple Inc. Kristine Gu o Liang-Chi Hsie h THIS IS NOT A CONTRIBUTION Structured Streaming Use-cases at Apple
  • 2. Liang-Chi Hsieh Apache Spark Committe r Software Engineer @ Appl e https://github.com/viirya https://www.linkedin.com/in/liang-chi-hsieh-a7904568/ Who am I
  • 3. Kristine Guo Software Engineer @ Appl e Focus on cloud platform technologie s Currently work on developing high scale backend system s https://www.linkedin.com/in/kristineguo/ Who am I
  • 4. Agenda Revive Previous Structured Streaming Effort New Enhancement to Structured Streaming Use-case at Appl e
  • 5. Features in Structured Streaming that matter to us New built-in StateStor e Session Windo w Stateful task scheduling enhancemen t Checkpoint enhancemen t
  • 7. StateStore: Current Status What is StateStore ? A component for state management for stateful operators such streaming aggregates, joins, etc . Stateful operators Checkpoint/Restore Project Get/Put key/value pairs StateStore FileSystem
  • 8. Built-in StateStore HDFSBackedStateStor e • Store states in an in-memory ma p • Checkpoint to HDFS-compatible file syste Disadvantages ? • Limitation by executor memory and an issue for large state use-case • Impact other memory usage on the executors
  • 9. We need new built-in StateStore Reviving RocksDB StateStore as a built-in StateStore in Structured Streamin Why ? • More and more streaming applications requiring large state • Widely used in the industr y • SPARK-34198: Add RocksDB StateStore as external modul • Received all positive responses from the communty
  • 10. Visit Current RocksDB StateStore OSS implementation s • https://github.com/chermenin/spark-states • SPARK-28120: RocksDB state storag e • https://github.com/qubole/spark-state-store OSS in the futur e • RocksDB StateStore from Databricks
  • 11. RocksDB StateStore Benchmark put/get significant key/value pairs Best Time(ms) Avg Time(ms) Relative Chermenin 32725 34507 1.0X Qubole 25493 25636 1.3X Databricks TBD TBD TBD
  • 12. Window Operations Time 12:01 12:02 10 min window 12:12 12:15 … Time event-time window 12:01 12:02 12:20 12:25 … session gap event-time window session window
  • 13. Reviving session window as a built-in window feature Session Window • SPARK-10816: EventTime based sessionizatio • Keep inactive in last few year s • Available in other streaming engines but missing at Structured Streaming
  • 14. Session Window Internals Session manipulatio n • Session initialization, restoring, merging, savin Internal StateStore forma t • Efficiently retrieve all session states for a specific session ke • Partially update the start time and duration of the affected windows
  • 15. Internal StateStore format • Simple session window list as the valu e • Double list approac h • Start times of session windows as key * Special thanks to Yuanjian Li for the state store format design doc
  • 16. Session Window List Approach • Easy to implemen t • Memory issue if too many sessions per valu • Not support partial update Session key A list of row Row(a1, a2, a3…, session_window(start_time, end_time)) Row structure Single state store
  • 17. Double List Approach • Order of session windows is kept, efficient to travers • No complex structur e • Harder to maintain Session key Row(…, session_window(s, e)) First start time key, start time1 key, start time2 key, start time3 None, start time 2 start time 1, start time 3 start time 2, None key, start time1 key, start time2 key, start time3 Row(…, session_window(s, e)) Row(…, session_window(s, e))
  • 18. Start times as Key Approach • Keep the order in the list of start time s • Store a list of session start times Row(…, session_window(s, e)) key 1 key 2 key 3 start time 1, start time 2… key1, start time1 Row(…, session_window(s, e)) Row(…, session_window(s, e)) start time 1, start time 2… start time 1, start time 2… key1, start time2 key2, start time1
  • 19. Agenda Revive Previous Structured Streaming Effort New Enhancement to Structured Streaming Use-case at Appl e
  • 20. Stateful Task Scheduling Enhancement • Spark task scheduling is not designed for stateful task • State store location is assigned arbitraril - Change of state store location causes frequent reloading from remote F - As obstacle of future checkpoint enhancements in our future works
  • 21. Stateful Task Scheduling Enhancement • Leveraging existing data locality preferences as a simple workaroun - SPARK-33814: Provide preferred locations for stateful operation • Customizing Spark task scheduling behavio - SPARK-35022: Task Scheduling Plugin in Spark Possible approaches: Ongoing works Spark Task Scheduler Task Scheduling Plugin New resource offers Candidate tasks for scheduling Scheduling preferences
  • 22. How Task Scheduling Plugin helps • Try best to distribute stateful tasks across available executor • Keep stateful task location stable across batches Task Scheduling Plugin Batch N Batch N + 1 Exec 1 Exec 2 Exec 3 Exec 4 State1 State2 State3 State4 Exec 1 State1
  • 23. How Task Scheduling Plugin helps • Try best to distribute stateful tasks across available executor • Keep stateful task location stable across batches Task Scheduling Plugin Batch N Batch N + 1 Exec 1 Exec 2 Exec 3 Exec 4 State1 State2 State3 State4 Exec 1 State1 Exec 3 State3
  • 24. How Task Scheduling Plugin helps • Try best to distribute stateful tasks across available executor • Keep stateful task location stable across batches Task Scheduling Plugin Batch N Batch N + 1 Exec 1 Exec 2 Exec 3 Exec 4 State1 State2 State3 State4 Exec 1 State1 Exec 3 State3 Exec 4 State4
  • 25. Agenda Revive Previous Structured Streaming Effort New Enhancement to Structured Streaming Use-case at Appl e
  • 26. Use Case • Two parallel data streams • Each stream performs aggregation over the same data source • Stream aggregation operates on dynamically-sized app-defined batches • Stream-stream join between the two data streams - Must account for potential lag between streams
  • 28. Performance Requirements • Throughput: PBs/day • RPS: High, O(10k) • Data size: Varying (1Kb to 1MB) • State store: Stream-stream join exerts high memory pressure
  • 29. Solutions • Accounting for potential lag: watermarking • Dynamic batch aggregation: session windows • Stream-stream join pressure: RocksDB-based State Store
  • 30. Thank you! • Your feedback is important to u s • Don’t forget to rate and review the sessions
  • 31. TM and © 2021 Apple Inc. All rights reserved.