SlideShare a Scribd company logo
How Spotify scales Apache Storm Pipelines
How Spotify scales Apache Storm Pipelines
How Spotify scales Apache Storm Pipelines
How Spotify scales Apache Storm Pipelines
○
○
○
○
○
How Spotify scales Apache Storm Pipelines
○
○
○
○
○
○
○
○
How Spotify scales Apache Storm Pipelines
Log Events
Apache Kafka
Real-time
Personalization Pipeline
Apache Storm
User Profile
Store
Apache Cassandra
Entity Metadata
Store
Apache Cassandra
○
○
○
○
○
○
○
○
How Spotify scales Apache Storm Pipelines
How Spotify scales Apache Storm Pipelines
How Spotify scales Apache Storm Pipelines
How Spotify scales Apache Storm Pipelines
○
○
○
○
○
○
○
○
How Spotify scales Apache Storm Pipelines
How Spotify scales Apache Storm Pipelines
○
○
○
○
○
○
○
○
How Spotify scales Apache Storm Pipelines
How Spotify scales Apache Storm Pipelines
○
○
○
○
○
○
○
○
How Spotify scales Apache Storm Pipelines
○
○
○
○
○
○
○
○
Build
v2
Storm
Cluster
Running
v1
t1 t2
Storm
Cluster
t3 t4
Running
v1
Running
v2
Deactivate
v1
Submit
v2
Check
v2
graphs
Kill
v1
Storm
Cluster
Running
v2
t5 t6 t7 t8
○
○
○
○
○
○
○
○
How Spotify scales Apache Storm Pipelines
Pagerduty Inhouse Solution
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
Log Events
Apache Kafka
Real-time
Personalization Pipeline
Apache Storm
User Profile
Store
Apache Cassandra
Entity Metadata
Store
Apache Cassandra
How Spotify scales Apache Storm Pipelines
How Spotify scales Apache Storm Pipelines
● Different tables for different TTLs and set
gc_grace_period=0. Read repairs are disabled.
● Used DateTieredCompactionStrategy for short lived
data.
● Control the number of open connections from Storm
topology to Cassandra
● Configure Snitch to ensure proper call routing
How Spotify scales Apache Storm Pipelines
● 1 worker per node per topology
● 1 executor per core for CPU bound tasks
● 1-10 executors per core for IO bound tasks
● Compute total parallelism possible and distribute it
amongst slow and fast tasks. High parallelism for
slow tasks, low for fast tasks.
* Parallelism tuning inspired by P Taylor Goetz’s Strata 2014 talk
○
○
○
○
○
○
○
○
● Think about constraints in external vs in-process caching
○ External Caching
■ Network IO
■ Latency
■ Another point of failure
○ In-process
■ Limited memory
■ No persistence
How Spotify scales Apache Storm Pipelines
How Spotify scales Apache Storm Pipelines

More Related Content

PDF
Real-Time Analytics with Kafka, Cassandra and Storm
PPTX
Introduction to Storm
PPTX
Spark vs storm
PDF
Real-time streams and logs with Storm and Kafka
PPTX
Real-Time Big Data at In-Memory Speed, Using Storm
PDF
Learning Stream Processing with Apache Storm
PDF
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
PPTX
Cassandra and Storm at Health Market Sceince
Real-Time Analytics with Kafka, Cassandra and Storm
Introduction to Storm
Spark vs storm
Real-time streams and logs with Storm and Kafka
Real-Time Big Data at In-Memory Speed, Using Storm
Learning Stream Processing with Apache Storm
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Cassandra and Storm at Health Market Sceince

What's hot (20)

PPT
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
PPTX
Introduction to Storm
PDF
Real Time Data Streaming using Kafka & Storm
PDF
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
PDF
1 Million Writes per second on 60 nodes with Cassandra and EBS
PDF
PHP Backends for Real-Time User Interaction using Apache Storm.
PPTX
Experience with Kafka & Storm
PDF
Storm and Cassandra
PPTX
Realtime Statistics based on Apache Storm and RocketMQ
PDF
Apache Storm Concepts
PPTX
Yahoo compares Storm and Spark
PDF
Scaling Apache Storm - Strata + Hadoop World 2014
PDF
Developing Java Streaming Applications with Apache Storm
PPTX
Multi-Tenant Storm Service on Hadoop Grid
PPTX
Storm-on-YARN: Convergence of Low-Latency and Big-Data
PPT
Real-Time Streaming with Apache Spark Streaming and Apache Storm
PPTX
Multi-tenant Apache Storm as a service
PPTX
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
PDF
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
PDF
Distributed real time stream processing- why and how
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Introduction to Storm
Real Time Data Streaming using Kafka & Storm
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
1 Million Writes per second on 60 nodes with Cassandra and EBS
PHP Backends for Real-Time User Interaction using Apache Storm.
Experience with Kafka & Storm
Storm and Cassandra
Realtime Statistics based on Apache Storm and RocketMQ
Apache Storm Concepts
Yahoo compares Storm and Spark
Scaling Apache Storm - Strata + Hadoop World 2014
Developing Java Streaming Applications with Apache Storm
Multi-Tenant Storm Service on Hadoop Grid
Storm-on-YARN: Convergence of Low-Latency and Big-Data
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Multi-tenant Apache Storm as a service
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Distributed real time stream processing- why and how
Ad

Viewers also liked (13)

PDF
Storm: The Real-Time Layer - GlueCon 2012
PPTX
Resource Aware Scheduling in Apache Storm
PDF
Storm: distributed and fault-tolerant realtime computation
PDF
Realtime Analytics with Storm and Hadoop
PPTX
Apache Storm 0.9 basic training - Verisign
PDF
Hadoop Summit Europe 2014: Apache Storm Architecture
PDF
Ad Yield Optimization @ Spotify - DataGotham 2013
PDF
Real time ads personalization @ Spotify
PPT
REST vs WS-*: Myths Facts and Lies
PDF
Evolution of Spotify's ad architecture (Qcon 2016 Shanghai)
PDF
Scaling Continuous Delivery to minimize risks (Delivery of Things 2016)
PDF
Ads Personalization at Spotify - NYC Data Engineering 10/23
PDF
Qcon London 2017 - Architecture overhaul - Ad serving @ Spotify scale
Storm: The Real-Time Layer - GlueCon 2012
Resource Aware Scheduling in Apache Storm
Storm: distributed and fault-tolerant realtime computation
Realtime Analytics with Storm and Hadoop
Apache Storm 0.9 basic training - Verisign
Hadoop Summit Europe 2014: Apache Storm Architecture
Ad Yield Optimization @ Spotify - DataGotham 2013
Real time ads personalization @ Spotify
REST vs WS-*: Myths Facts and Lies
Evolution of Spotify's ad architecture (Qcon 2016 Shanghai)
Scaling Continuous Delivery to minimize risks (Delivery of Things 2016)
Ads Personalization at Spotify - NYC Data Engineering 10/23
Qcon London 2017 - Architecture overhaul - Ad serving @ Spotify scale
Ad

Similar to How Spotify scales Apache Storm Pipelines (20)

PDF
Open analytics meetup alex poon (1)
PDF
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
PDF
Using new tools for migrating Ticketmsater data
PDF
Storm at spider.io - London Storm Meetup 2013-06-18
PPTX
Big Data Anti-Patterns: Lessons From the Front LIne
PPTX
High Performance Processing of Streaming Data
PPTX
An adaptive and eventually self healing framework for geo-distributed real-ti...
PPTX
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
PDF
New York Storm Users Group 2014-01-28 - Using Storm with MapR M7 for Real-Tim...
PDF
Storm at Forter
PDF
Netflix - Realtime Impression Store
PDF
Collecting 600M events/day
PPTX
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
PDF
Successful Architectures for Fast Data
PPT
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
PDF
Persistence Pipelines in a Processing Graph: Mutable Big Data at Salesforce b...
PPTX
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
PPTX
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
PPTX
A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...
PDF
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
Open analytics meetup alex poon (1)
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Using new tools for migrating Ticketmsater data
Storm at spider.io - London Storm Meetup 2013-06-18
Big Data Anti-Patterns: Lessons From the Front LIne
High Performance Processing of Streaming Data
An adaptive and eventually self healing framework for geo-distributed real-ti...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
New York Storm Users Group 2014-01-28 - Using Storm with MapR M7 for Real-Tim...
Storm at Forter
Netflix - Realtime Impression Store
Collecting 600M events/day
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
Successful Architectures for Fast Data
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Persistence Pipelines in a Processing Graph: Mutable Big Data at Salesforce b...
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...
EVCache: Lowering Costs for a Low Latency Cache with RocksDB

Recently uploaded (20)

PDF
Mega Projects Data Mega Projects Data
PPT
Quality review (1)_presentation of this 21
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Introduction to the R Programming Language
PDF
annual-report-2024-2025 original latest.
PPTX
Database Infoormation System (DBIS).pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
Leprosy and NLEP programme community medicine
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Computer network topology notes for revision
PDF
Lecture1 pattern recognition............
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
Mega Projects Data Mega Projects Data
Quality review (1)_presentation of this 21
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Supervised vs unsupervised machine learning algorithms
SAP 2 completion done . PRESENTATION.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Galatica Smart Energy Infrastructure Startup Pitch Deck
Introduction to the R Programming Language
annual-report-2024-2025 original latest.
Database Infoormation System (DBIS).pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Business Analytics and business intelligence.pdf
Leprosy and NLEP programme community medicine
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Computer network topology notes for revision
Lecture1 pattern recognition............
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
climate analysis of Dhaka ,Banglades.pptx

How Spotify scales Apache Storm Pipelines