SlideShare a Scribd company logo
TheEvolution
ofBigDataat
Spotify
Josh Baer (jbx@spotify.com)
WhoAm I?
‣ Technical Product Owner at
Spotify, responsible for
Hadoop
@l_phant
Overview
• Creating Music Charts in three parts:
• Playing Music
• Collecting Data
• Processing Data
• The Future
Building
MusicCharts
The Evolution of Big Data at Spotify
Building Music Charts
Play Music Collect Data Process
PlayingMusic
What is Spotify?
• Music Streaming Service
• Browse and Discover Millions of
Songs,Artists andAlbums
• Bythe end of 2014
• 60 Million Monthly Users
• 15 Million Paid Subscribers
What is Spotify?
• Data Infrastructure
• 1300 Hadoop Nodes
• 42 PB Storage
• 30TB data ingested via Kafka/day
• 400TB generated by Hadoop/day
Powered by Data
• RunningApp
• Matches music to running tempo
• Personalized running playlists in
multiple tempos for millions of
active users
http://www.theverge.com/2015/6/1/8696659/spotify-running-is-great-for-discovery
Powered by Data
• Now Page
• Shows, podcasts and playlists
based on day-parts
• Personalized layout so you always
have the right music forthe right
moment
The Evolution of Big Data at Spotify
Building Music Charts
10.123.133.333 - - [Mon, 3 June 2015 11:31:33 GMT] "GET /api/admin/job/
aggregator/status HTTP/1.1" 200 1847 "https://my.analytics.app/admin"
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36"
10.123.133.222 - - [Mon, 3 June 2015 11:31:43 GMT] "GET /api/admin/job/
aggregator/status HTTP/1.1" 200 1984 "https://my.analytics.app/admin"
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36”
10.123.133.222 - - [Mon, 3 June 2015 11:33:02 GMT] "GET /dashboard/
courses/1291726 HTTP/1.1" 304 - "https://my.analytics.app/admin"
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36"
10.321.145.111 - - [Mon, 3 June 2015 11:33:03 GMT] "GET /api/loggedInUser
HTTP/1.1" 304 - "https://my.analytics.app/dashboard/courses/1291726"
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36"
10.112.322.111 - - [Mon, 3 June 2015 11:33:03 GMT] "POST /api/
instrumentation/events/new HTTP/1.1" 200 2 "https://my.analytics.app/
dashboard/courses/1291726" "Mozilla/5.0 (Macintosh; Intel Mac OS X
10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81
Safari/537.36”
10.123.133.222 - - [Mon, 3 June 2015 11:33:02 GMT] "GET /dashboard/
courses/1291726 HTTP/1.1" 304 - "https://my.analytics.app/admin"
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36"
• Raw data is complicated
• Often dirty
• Evolving structure
• Duplication all over
• Getting data to a central
processing point is HARD
Collecting
Data
“It’s simple, we just
throw the data into
Hadoop”
A naive data engineer
LogArchiver
• Original method to transport logs fromAPs to HDFS
• Lasted from 2009 - 2013
• Relies on rsynch/scp to move files around
• Regularly scheduled via cron
The Evolution of Big Data at Spotify
LogArchiver Fails
• Worked well with small number ofAPs
• Issues with scale
• Manual Processes of adding new hosts
• Frequent dying of hosts or network issues caused massive congestion
• Manual process of overrides
Apache Kafka to the rescue!
• Apublish-subscribe messaging system open sourced by LinkedIn in 2011
• High level overview:
• Topic: Feeds of messages
• Producer:Amessage publisher
• Consumer :Asubscriber oftopics
Apache Kafka to the rescue!
• Log -> HDFS latency reduced from hours to seconds!
• Benefits:
• Community supported
• Division of responsibilities
• Allowed for enhanced streaming use-cases
The Evolution of Big Data at Spotify
Processing
Data
Workflow Management Fail!
0	
  *	
  *	
  *	
  *	
  	
  	
  	
  spotify-­‐core	
  	
  	
  	
  	
  	
  hadoop	
  jar	
  merge_hourly.jar	
  
15	
  *	
  *	
  *	
  *	
  	
  	
  spotify-­‐core	
  	
  	
  	
  	
  	
  hadoop	
  jar	
  aggregate_song_plays.jar	
  
30	
  *	
  *	
  *	
  *	
  	
  	
  spotify-­‐analytics	
  hadoop	
  jar	
  merge_artist_song.jar	
  
*	
  1	
  *	
  *	
  *	
  	
  	
  	
  spotify-­‐core	
  	
  	
  	
  	
  	
  hadoop	
  jar	
  daily_aggregate.jar	
  
*	
  2	
  *	
  *	
  *	
  	
  	
  	
  spotify-­‐core	
  	
  	
  	
  	
  	
  hadoop	
  jar	
  calculate_royalties.jar	
  
*/2	
  22	
  *	
  *	
  *	
  spotify-­‐radio	
  	
  	
  	
  	
  hadoop	
  jar	
  generate_radio.jar	
  
Handles the ‘plumbing’ for Hadoop jobs
https://github.com/spotify/luigi
Luigi - Python Workflow Manager
Easy to get started, no xml like Oozie
Hadoop Availability
• In 2013:
• Hadoop expanded to 200 nodes
• It was business critical
• It was not very reliable :-(
• Created a ‘squad’ with two missions:
• Migrate to a new distribution withYarn
• Make Hadoop reliable
How did we do?
HadoopUptime
90%
92%
94%
96%
98%
100%
Q3-2012 Q4-2012 Q1-2013 Q2-2013 Q3-2013 Q4-2013 Q1-2014 Q2-2014 Q3-2014 Q4-2014 Q1-2015 Q2-2015
Hadoop ownerless Dedicated
squad launches
Upgrade
instability
Continually improving
Going from Python to Crunch
• Most of our jobs were Hadoop (python) streaming
• Lots of failures, slow performance
• Had to find a betterway
Moving from Python to Crunch
• Investigated several frameworks*
• Selected Crunch:
• Real types - compile time error detection, bettertestability
• Higher levelAPI - let the framework optimize foryou
• Better performance #JVM_FTW
*thewit.ch/scalding_crunchy_pig
The Evolution of Big Data at Spotify
Play Music Collect Data Process
Data driven features that
allows for new ways to
play and discover music
Lower latency and
enhanced reliability for
passing data from Access
Points to HDFS via Kafka
Increased Hadoop
reliability, Luigi scheduling
and better performance
with Crunch
Improving Charts!
TheFuture
Growth%
0
500
1000
1500
2000
2500
3000
3500
2012 2013 2014 2015
Hadoop Usage Spotify Users
Growth of Hadoop vs. Spotify Users
Explosive Growth
• Increased Spotify Users
• More users listening to more music -> more data -> longer running jobs
• Increased Use Cases
• Beyond simple analytics into Machine Learning, advanced processing
• Increased Engineers
• In 2014, growth of data and machine learning engineers grew rapidly
Scaling Machines: Easy
Scaling People: Hard
User Feedback:
Automate it!
Inviso
Developed by Netflix: https://github.com/Netflix/inviso
Hadoop Report Card
• Contains Statistics
• Guidelines and Best
Practices
• Sent Quarterly
RealTime Use Cases
• Expanding our use of Storm for:
• TargetingAds based on genres
• Visualizing Data
• Quicker recommendations
• More information:
• https://labs.spotify.com/2015/01/05/how-spotify-scales-apache-storm/
Two takeaways
• Getting data into Hadoop is halfthe challenge.Think early
and often about scale.
• Increasing infrastructure reliability and performance leads to
expanded use.This adds challenges but it’s a good problem
to have.
Join The Band!
Engineers needed in NYC, Stockholm
http://spotify.com/jobs

More Related Content

PDF
Big Data At Spotify
PDF
The State of Decentralized Storage
PPTX
Data In Motion Paris 2023
PPTX
How to Choose The Right Database on AWS - Berlin Summit - 2019
PPTX
Firebase Analytics
PDF
Data Driven Assessment.pdf
PPTX
Beyond DevOps - How Netflix Bridges the Gap
Big Data At Spotify
The State of Decentralized Storage
Data In Motion Paris 2023
How to Choose The Right Database on AWS - Berlin Summit - 2019
Firebase Analytics
Data Driven Assessment.pdf
Beyond DevOps - How Netflix Bridges the Gap

What's hot (20)

PDF
How Apache Drives Music Recommendations At Spotify
PDF
Machine Learning and Big Data for Music Discovery at Spotify
PDF
The Evolution of Hadoop at Spotify - Through Failures and Pain
PDF
From Idea to Execution: Spotify's Discover Weekly
PDF
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
PDF
Spotify architecture - Pressing play
PDF
Parquet performance tuning: the missing guide
PDF
Big data and machine learning @ Spotify
PDF
The Parquet Format and Performance Optimization Opportunities
PDF
Apache Nifi Crash Course
PDF
Incremental View Maintenance with Coral, DBT, and Iceberg
PDF
Solving Enterprise Data Challenges with Apache Arrow
PDF
Pinot: Near Realtime Analytics @ Uber
PDF
Spotify: Data center & Backend buildout
PDF
Building Data Pipelines for Music Recommendations at Spotify
PPTX
Deep Dive into Apache Kafka
PDF
Algorithmic Music Recommendations at Spotify
PPTX
Compression Options in Hadoop - A Tale of Tradeoffs
PPTX
Collaborative Filtering at Spotify
PDF
Scala Data Pipelines for Music Recommendations
How Apache Drives Music Recommendations At Spotify
Machine Learning and Big Data for Music Discovery at Spotify
The Evolution of Hadoop at Spotify - Through Failures and Pain
From Idea to Execution: Spotify's Discover Weekly
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
Spotify architecture - Pressing play
Parquet performance tuning: the missing guide
Big data and machine learning @ Spotify
The Parquet Format and Performance Optimization Opportunities
Apache Nifi Crash Course
Incremental View Maintenance with Coral, DBT, and Iceberg
Solving Enterprise Data Challenges with Apache Arrow
Pinot: Near Realtime Analytics @ Uber
Spotify: Data center & Backend buildout
Building Data Pipelines for Music Recommendations at Spotify
Deep Dive into Apache Kafka
Algorithmic Music Recommendations at Spotify
Compression Options in Hadoop - A Tale of Tradeoffs
Collaborative Filtering at Spotify
Scala Data Pipelines for Music Recommendations
Ad

Viewers also liked (17)

PDF
Data at Spotify
PDF
Measuring team performance at spotify slideshare
PDF
The Spotify Playbook
PDF
Africa DevOps Day 2015
PDF
Making Better Mistakes Tomorrow
PDF
Activation: From thinking to tweaking it, how we do it at Spotify
PDF
Machine learning @ Spotify - Madison Big Data Meetup
PDF
Shortening the feedback loop
PDF
Collaborative Filtering with Spark
PDF
Growing up with agile - how the Spotify 'model' has evolved
PDF
Music Recommendations at Scale with Spark
PPTX
The California Community College’s Education Planning Initiative (EPI)
PPTX
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
PDF
Spotify: Profils d'écoute et data storytelling @ Radio 2.0 2015
PDF
Music Recommendations at Spotify
PDF
Impact et attractivité des concerts des radio pour les auditeurs Etude HyperW...
PDF
Trends in the digital/physical world - March 2014
Data at Spotify
Measuring team performance at spotify slideshare
The Spotify Playbook
Africa DevOps Day 2015
Making Better Mistakes Tomorrow
Activation: From thinking to tweaking it, how we do it at Spotify
Machine learning @ Spotify - Madison Big Data Meetup
Shortening the feedback loop
Collaborative Filtering with Spark
Growing up with agile - how the Spotify 'model' has evolved
Music Recommendations at Scale with Spark
The California Community College’s Education Planning Initiative (EPI)
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Spotify: Profils d'écoute et data storytelling @ Radio 2.0 2015
Music Recommendations at Spotify
Impact et attractivité des concerts des radio pour les auditeurs Etude HyperW...
Trends in the digital/physical world - March 2014
Ad

Similar to The Evolution of Big Data at Spotify (20)

PDF
Delivering Personalized Music Discovery
PDF
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
PPTX
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
PDF
Data Infrastructure for a World of Music
PPTX
Visual Mapping of Clickstream Data
PPTX
Keynote - Cloudera - Mike Olson - Hadoop World 2010
PPTX
Cloudera - Mike Olson - Hadoop World 2010
PPTX
Hadoop Platform at Yahoo
PPTX
Big Data, Hadoop, NoSQL and more ...
PDF
Hadoop
PDF
Big Data Processing Utilizing Open-source Technologies - May 2015
PDF
Architectural considerations for Hadoop Applications
PDF
The Hadoop Ecosystem for Developers
PDF
Ad Personalization at Spotify: Iterative Enginering and Product Development -...
PDF
Ads Personalization at Spotify - NYC Data Engineering 10/23
PDF
Big data and hadoop overvew
PDF
Hadoop Application Architectures tutorial - Strata London
PDF
Hadoop Master Class : A concise overview
PPTX
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
PDF
Big Data , Big Problem?
Delivering Personalized Music Discovery
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Data Infrastructure for a World of Music
Visual Mapping of Clickstream Data
Keynote - Cloudera - Mike Olson - Hadoop World 2010
Cloudera - Mike Olson - Hadoop World 2010
Hadoop Platform at Yahoo
Big Data, Hadoop, NoSQL and more ...
Hadoop
Big Data Processing Utilizing Open-source Technologies - May 2015
Architectural considerations for Hadoop Applications
The Hadoop Ecosystem for Developers
Ad Personalization at Spotify: Iterative Enginering and Product Development -...
Ads Personalization at Spotify - NYC Data Engineering 10/23
Big data and hadoop overvew
Hadoop Application Architectures tutorial - Strata London
Hadoop Master Class : A concise overview
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Big Data , Big Problem?

Recently uploaded (20)

PDF
Machine learning based COVID-19 study performance prediction
PDF
Encapsulation theory and applications.pdf
PDF
KodekX | Application Modernization Development
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Cloud computing and distributed systems.
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPT
Teaching material agriculture food technology
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
A Presentation on Artificial Intelligence
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Machine learning based COVID-19 study performance prediction
Encapsulation theory and applications.pdf
KodekX | Application Modernization Development
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Network Security Unit 5.pdf for BCA BBA.
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
20250228 LYD VKU AI Blended-Learning.pptx
Empathic Computing: Creating Shared Understanding
Cloud computing and distributed systems.
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Spectral efficient network and resource selection model in 5G networks
Chapter 3 Spatial Domain Image Processing.pdf
Teaching material agriculture food technology
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
A Presentation on Artificial Intelligence
Understanding_Digital_Forensics_Presentation.pptx
Unlocking AI with Model Context Protocol (MCP)
Building Integrated photovoltaic BIPV_UPV.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...

The Evolution of Big Data at Spotify

  • 2. WhoAm I? ‣ Technical Product Owner at Spotify, responsible for Hadoop @l_phant
  • 3. Overview • Creating Music Charts in three parts: • Playing Music • Collecting Data • Processing Data • The Future
  • 6. Building Music Charts Play Music Collect Data Process
  • 8. What is Spotify? • Music Streaming Service • Browse and Discover Millions of Songs,Artists andAlbums • Bythe end of 2014 • 60 Million Monthly Users • 15 Million Paid Subscribers
  • 9. What is Spotify? • Data Infrastructure • 1300 Hadoop Nodes • 42 PB Storage • 30TB data ingested via Kafka/day • 400TB generated by Hadoop/day
  • 10. Powered by Data • RunningApp • Matches music to running tempo • Personalized running playlists in multiple tempos for millions of active users http://www.theverge.com/2015/6/1/8696659/spotify-running-is-great-for-discovery
  • 11. Powered by Data • Now Page • Shows, podcasts and playlists based on day-parts • Personalized layout so you always have the right music forthe right moment
  • 13. Building Music Charts 10.123.133.333 - - [Mon, 3 June 2015 11:31:33 GMT] "GET /api/admin/job/ aggregator/status HTTP/1.1" 200 1847 "https://my.analytics.app/admin" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36" 10.123.133.222 - - [Mon, 3 June 2015 11:31:43 GMT] "GET /api/admin/job/ aggregator/status HTTP/1.1" 200 1984 "https://my.analytics.app/admin" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36” 10.123.133.222 - - [Mon, 3 June 2015 11:33:02 GMT] "GET /dashboard/ courses/1291726 HTTP/1.1" 304 - "https://my.analytics.app/admin" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36" 10.321.145.111 - - [Mon, 3 June 2015 11:33:03 GMT] "GET /api/loggedInUser HTTP/1.1" 304 - "https://my.analytics.app/dashboard/courses/1291726" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36" 10.112.322.111 - - [Mon, 3 June 2015 11:33:03 GMT] "POST /api/ instrumentation/events/new HTTP/1.1" 200 2 "https://my.analytics.app/ dashboard/courses/1291726" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36” 10.123.133.222 - - [Mon, 3 June 2015 11:33:02 GMT] "GET /dashboard/ courses/1291726 HTTP/1.1" 304 - "https://my.analytics.app/admin" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36" • Raw data is complicated • Often dirty • Evolving structure • Duplication all over • Getting data to a central processing point is HARD
  • 15. “It’s simple, we just throw the data into Hadoop” A naive data engineer
  • 16. LogArchiver • Original method to transport logs fromAPs to HDFS • Lasted from 2009 - 2013 • Relies on rsynch/scp to move files around • Regularly scheduled via cron
  • 18. LogArchiver Fails • Worked well with small number ofAPs • Issues with scale • Manual Processes of adding new hosts • Frequent dying of hosts or network issues caused massive congestion • Manual process of overrides
  • 19. Apache Kafka to the rescue! • Apublish-subscribe messaging system open sourced by LinkedIn in 2011 • High level overview: • Topic: Feeds of messages • Producer:Amessage publisher • Consumer :Asubscriber oftopics
  • 20. Apache Kafka to the rescue! • Log -> HDFS latency reduced from hours to seconds! • Benefits: • Community supported • Division of responsibilities • Allowed for enhanced streaming use-cases
  • 23. Workflow Management Fail! 0  *  *  *  *        spotify-­‐core            hadoop  jar  merge_hourly.jar   15  *  *  *  *      spotify-­‐core            hadoop  jar  aggregate_song_plays.jar   30  *  *  *  *      spotify-­‐analytics  hadoop  jar  merge_artist_song.jar   *  1  *  *  *        spotify-­‐core            hadoop  jar  daily_aggregate.jar   *  2  *  *  *        spotify-­‐core            hadoop  jar  calculate_royalties.jar   */2  22  *  *  *  spotify-­‐radio          hadoop  jar  generate_radio.jar  
  • 24. Handles the ‘plumbing’ for Hadoop jobs https://github.com/spotify/luigi Luigi - Python Workflow Manager Easy to get started, no xml like Oozie
  • 25. Hadoop Availability • In 2013: • Hadoop expanded to 200 nodes • It was business critical • It was not very reliable :-( • Created a ‘squad’ with two missions: • Migrate to a new distribution withYarn • Make Hadoop reliable
  • 26. How did we do? HadoopUptime 90% 92% 94% 96% 98% 100% Q3-2012 Q4-2012 Q1-2013 Q2-2013 Q3-2013 Q4-2013 Q1-2014 Q2-2014 Q3-2014 Q4-2014 Q1-2015 Q2-2015 Hadoop ownerless Dedicated squad launches Upgrade instability Continually improving
  • 27. Going from Python to Crunch • Most of our jobs were Hadoop (python) streaming • Lots of failures, slow performance • Had to find a betterway
  • 28. Moving from Python to Crunch • Investigated several frameworks* • Selected Crunch: • Real types - compile time error detection, bettertestability • Higher levelAPI - let the framework optimize foryou • Better performance #JVM_FTW *thewit.ch/scalding_crunchy_pig
  • 30. Play Music Collect Data Process Data driven features that allows for new ways to play and discover music Lower latency and enhanced reliability for passing data from Access Points to HDFS via Kafka Increased Hadoop reliability, Luigi scheduling and better performance with Crunch Improving Charts!
  • 32. Growth% 0 500 1000 1500 2000 2500 3000 3500 2012 2013 2014 2015 Hadoop Usage Spotify Users Growth of Hadoop vs. Spotify Users
  • 33. Explosive Growth • Increased Spotify Users • More users listening to more music -> more data -> longer running jobs • Increased Use Cases • Beyond simple analytics into Machine Learning, advanced processing • Increased Engineers • In 2014, growth of data and machine learning engineers grew rapidly
  • 36. Inviso Developed by Netflix: https://github.com/Netflix/inviso
  • 37. Hadoop Report Card • Contains Statistics • Guidelines and Best Practices • Sent Quarterly
  • 38. RealTime Use Cases • Expanding our use of Storm for: • TargetingAds based on genres • Visualizing Data • Quicker recommendations • More information: • https://labs.spotify.com/2015/01/05/how-spotify-scales-apache-storm/
  • 39. Two takeaways • Getting data into Hadoop is halfthe challenge.Think early and often about scale. • Increasing infrastructure reliability and performance leads to expanded use.This adds challenges but it’s a good problem to have.
  • 40. Join The Band! Engineers needed in NYC, Stockholm http://spotify.com/jobs