SlideShare a Scribd company logo
March 25, 2014
Neville Li
neville@spotify.com
@sinisa_lyh
Storm at Spotify
•@Spotify since 2011
•Recommendation Team
•Data & Backend
•Storm, Scalding, Spark,
Scala…
About Me
March 25, 2014
Spotify in numbers
Started in 2006, available in 55 markets
20+ million songs, 20,000 added per day
24+ million active users, 6+ million subscribers
1.5 billion playlists
!
Big Data
@spotify
600 node cluster
Every day
•400GB service logs
•4.5TB user data
•5,000 Hadoop jobs
•61TB generated
March 25, 2014
What is Storm?
In data-layman’s terms
• Real time stream processing
• Like Hadoop without HDFS
• Like Map/Reduce with many reducer steps
• Fault tolerant & guaranteed message processing
Photo © Blaine Courts http://www.flickr.com/photos/blainecourts/8417266909/
Storm @spotify
•storm-0.8.0
•22 node cluster
•15+ topologies
•200,000+ tuples per second
•recommendation, ads,
monitoring, analytics, etc.
“Never Gonna Give You Up”
Rick Astley Map
!
First Storm
Application
@Spotify
7
RT Market Launch Stats
Other Uses
•Trending tracks
•Email campaign
•App performance tracking
•UX tracking
Anatomy of
A Storm Topology
From play to
recommendation
Social Listening
Take 1
•PUB/SUB
•Almost real-time
•Spammy
•Hard to scale
All characters appearing in this work are fictitious. Any resemblance to real persons, living
or dead, is purely coincidental.
this
 guy
 again!
Social Listening
Take 2
•Hadoop daily batch
•High latency
•M/R aggregation
•Easier to scale
Social Listening
Revamped
•Kafka → Storm → Backend
•Soft real-time
•Aggregate  trigger bolt
•Easy to scale
Getting Data
accesspoint
playlist search storage
social
kafka
What are we transferring?
•TSV logs with version  type (moving to Avro)
•Centralized Schema Repository
•Parsers in Python  Java
•Log parsing  splitting by topic in Kafka
EndSong 21 username:Str timestamp:Int trackId:Str
msPlayed:Int reasonStart:Str reasonEnd:Str …
ClientEvent 15 username:Str platform:Str timestamp:Int
jsonData: Str …
March 25, 2014
Getting Data Across the Globe
Photo © Riley Kaminer http://www.flickr.com/photos/rwkphotography/3282192071/
Ashburn London
Stockholm
San
 Jose
Hadoop Storm
big
 kafka
consumer
LONDON
March 25, 2014
Topology
EndSong
 
filter
kafka
 
spout
metadata
 
decorator
listening
 
trigger
privacy
 
filterZMTP
 
publisher
metadata
prefsfeed SUB
GET
GET
EndSong
Filter Bolt
•Discard some tuples
–Skipped
–Too short
•Keep some fields
–Context
–Reasons
Metadata Decoration Bolt
•tuple.getStringByField(“trackId”)!
•Append fields in output tuple
•[input fields…, “artistId”, “albumId”]!
•Input fields as constructor argument
•Reuse!
monadic!
Async  Batch RPC
metadata
tuple batch
callback
queueupstream
REQ REP
emit
ack
bolt

More Related Content

PDF
Recommending and searching @ Spotify
PDF
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
PDF
Scala Data Pipelines @ Spotify
PDF
The Evolution of Hadoop at Spotify - Through Failures and Pain
PDF
Scala Data Pipelines for Music Recommendations
PDF
Data at Spotify
PDF
Music Personalization : Real time Platforms.
PDF
Building Data Pipelines for Music Recommendations at Spotify
Recommending and searching @ Spotify
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Scala Data Pipelines @ Spotify
The Evolution of Hadoop at Spotify - Through Failures and Pain
Scala Data Pipelines for Music Recommendations
Data at Spotify
Music Personalization : Real time Platforms.
Building Data Pipelines for Music Recommendations at Spotify

What's hot (20)

PDF
The Evolution of Big Data at Spotify
PDF
Massive Data Processing in Adobe Using Delta Lake
PDF
Approximate nearest neighbor methods and vector models – NYC ML meetup
PDF
Big Data At Spotify
PPTX
Netflix talk at ML Platform meetup Sep 2019
PPTX
LinkedIn talk at Netflix ML Platform meetup Sep 2019
PDF
Data pipelines from zero to solid
PDF
Introducing Neo4j
PDF
Algorithmic Music Recommendations at Spotify
PDF
Recent advances in deep recommender systems
PDF
Machine Learning and Big Data for Music Discovery at Spotify
PDF
Big data and machine learning @ Spotify
PDF
From Idea to Execution: Spotify's Discover Weekly
PPTX
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
PPTX
Sizing Your MongoDB Cluster
PPTX
Recommendation at Netflix Scale
PDF
Search @ Spotify
PDF
Deep Learning for Recommender Systems
PDF
Adobe Behance Scales to Millions of Users at Lower TCO with Neo4j
PDF
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
The Evolution of Big Data at Spotify
Massive Data Processing in Adobe Using Delta Lake
Approximate nearest neighbor methods and vector models – NYC ML meetup
Big Data At Spotify
Netflix talk at ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019
Data pipelines from zero to solid
Introducing Neo4j
Algorithmic Music Recommendations at Spotify
Recent advances in deep recommender systems
Machine Learning and Big Data for Music Discovery at Spotify
Big data and machine learning @ Spotify
From Idea to Execution: Spotify's Discover Weekly
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Sizing Your MongoDB Cluster
Recommendation at Netflix Scale
Search @ Spotify
Deep Learning for Recommender Systems
Adobe Behance Scales to Millions of Users at Lower TCO with Neo4j
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
Ad

Similar to Storm at Spotify (20)

PDF
Music Hackday Boston - The Last.fm API
PDF
Hive at Last.fm
PPTX
Music streams
PDF
Shortening the feedback loop
PDF
2013 11-07 lsr-dublin_m_hausenblas_when solr is best
PPTX
Data Science at Scale: Using Apache Spark for Data Science at Bitly
PDF
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
PDF
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
PPTX
Apache Lucene 4
PPTX
The Background Noise of the Internet
PDF
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
PDF
Big Search 4 Big Data War Stories
PDF
QMUL C4DM API Presentation @ BCN Music Hack Day
PDF
Databricks Meetup @ Los Angeles Apache Spark User Group
KEY
Building big things in Java
PPTX
Practical Machine Learning for Smarter Search with Solr and Spark
PPTX
Practical Machine Learning for Smarter Search with Spark+Solr
PPTX
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
PDF
Halko_santafe_2015
PPTX
The ultimate guide for Elasticsearch plugins
Music Hackday Boston - The Last.fm API
Hive at Last.fm
Music streams
Shortening the feedback loop
2013 11-07 lsr-dublin_m_hausenblas_when solr is best
Data Science at Scale: Using Apache Spark for Data Science at Bitly
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Apache Lucene 4
The Background Noise of the Internet
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Big Search 4 Big Data War Stories
QMUL C4DM API Presentation @ BCN Music Hack Day
Databricks Meetup @ Los Angeles Apache Spark User Group
Building big things in Java
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Spark+Solr
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Halko_santafe_2015
The ultimate guide for Elasticsearch plugins
Ad

More from Neville Li (6)

PDF
Sorry - How Bieber broke Google Cloud at Spotify
PDF
Scio - Moving to Google Cloud, A Spotify Story
PDF
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
PDF
PDF
From stream to recommendation using apache beam with cloud pubsub and cloud d...
PDF
Why functional why scala
Sorry - How Bieber broke Google Cloud at Spotify
Scio - Moving to Google Cloud, A Spotify Story
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
From stream to recommendation using apache beam with cloud pubsub and cloud d...
Why functional why scala

Recently uploaded (20)

PPTX
Introduction to Artificial Intelligence
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
System and Network Administraation Chapter 3
PDF
medical staffing services at VALiNTRY
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
L1 - Introduction to python Backend.pptx
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
Nekopoi APK 2025 free lastest update
PPTX
ai tools demonstartion for schools and inter college
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
Introduction to Artificial Intelligence
How to Migrate SBCGlobal Email to Yahoo Easily
Odoo POS Development Services by CandidRoot Solutions
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
System and Network Administraation Chapter 3
medical staffing services at VALiNTRY
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Odoo Companies in India – Driving Business Transformation.pdf
PTS Company Brochure 2025 (1).pdf.......
Navsoft: AI-Powered Business Solutions & Custom Software Development
L1 - Introduction to python Backend.pptx
2025 Textile ERP Trends: SAP, Odoo & Oracle
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
Nekopoi APK 2025 free lastest update
ai tools demonstartion for schools and inter college
Softaken Excel to vCard Converter Software.pdf
Which alternative to Crystal Reports is best for small or large businesses.pdf

Storm at Spotify