Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nitin Sharma and Elliot Chow

#ML7SAIS
Elliot Chow, Netflix
Nitin Sharma, Netflix
Near Real-Time Recommendations - Spark
Streaming
#ML7SAIS

#ML7SAIS
Agenda
● Recommendations @ Netflix
● The Need for Near Real Time
● Use Cases
● Common Infrastructure
● Scaling Challenges

#ML7SAIS
Recommendations at Netflix
● Personalize the Netflix experience for each member
○ Goal: Quickly help members find content they’d like to
watch
○ Risk: Member may lose interest and abandon the service
○ Challenge: Recommending at scale

#ML7SAIS
Scale @ Netflix
● 125M+ active members
● 190 countries
● 450B+ unique events/day
● 700+ Kafka topics

#ML7SAIS
● Data stored in Hive/S3
● Batch ETLs using Spark/Hive
● Table partitioning by day or hour
● Job scheduling by both CRON or data availability
● Latency often is on the order of days
Typical Data Pipelines @ Netflix

#ML7SAIS
● Dynamic catalog
● Growing member base
● Time sensitivity
○ Content popularity changes
○ Member interests evolve
The Need for Near Real Time (NRT)

#ML7SAIS
● Increasing amount of data
○ Process data as soon as possible to keep latencies low
○ Minimize amount of data to reprocess in case of failure
● Multi-Armed Bandits Adoption
The Need for Near Real Time (NRT)

#ML7SAIS
Use Cases
● Video Insights
● ML Pipelines for Recommendations

#ML7SAIS
NRT for Video
Insights

#ML7SAIS
Video Insights
● New title launches
● Early reads on title performance
● Slice by various dimensions

#ML7SAIS
Video Insights
Service

#ML7SAIS
Video Insights
Service
Client

#ML7SAIS
Video Insights - State
● Counts maintained in Spark
● Custom state management
○ Based on mapWithState implementation
○ Easier to re-use the function f in batch mode
○ Lower-level control over state management
○ scanByKey alternative for keyed state
input.scan(initRDD)((currentRDD, batchRDD) => f(currentRDD, batchRDD))

#ML7SAIS
NRT for Recommendations

#ML7SAIS
Billboard Recommendations
● Recommend a relevant title to each
member
● Right time
● Respond quickly to member feedback

#ML7SAIS
Artwork Personalization
19
● Personalized Image
● Visual Evidence
● Quickly adapting - Title
launches, member tastes
● Rapid learning - Cold start

#ML7SAIS
Traditional Recommendations
Millions of
Play related
Signals
Training
pipelines
Models
Precompute/Live
Compute
Rankings
DataStore/
Online
caches
ETL
Layers
Few days

#ML7SAIS
NRT Recommendations
Millions of
Play related
Signals
Models
Precompute/Live
Compute
Rankings
DataStore/Online
caches
Training
pipelines
Streaming
layer
NRT

#ML7SAIS
Required Data
● Impressions, Plays, etc.
● Attribution
● Explore/Exploit Metadata

#ML7SAIS
Attribution Infrastructure
Stream Processing Batch Processing
Query API

#ML7SAIS
Stream Processing - Zoomed in
Impression
Cache
Video
Metadata
Play
Enrich

#ML7SAIS
Batch Processing - Zoomed in
Impression
Play
Experimentation
Data
Windowed
Impression
Windowed
Play
Dedupe
Join
Video
Metadata

#ML7SAIS
Common Infrastructure

#ML7SAIS
Netflix Spark Stack
● Jenkins
● Spinnaker
● ASG
● Runners: Marathon, Meson
● Resource Manager: Mesos
● Storage: HDFS, S3, EFS
● Multi-Region

#ML7SAIS
Multi Region Challenges
● Geo routing
● Impression in one region;
play in another
● Streaming - Multi Region
● Batch Reduce/Merge - One
region

#ML7SAIS
Can it withstand Chaos?
• Chaos is a design principle
• Instance Failovers => Region Failovers
• Transparent to the consumers
• Over provisioned
• Long term - Autoscale

#ML7SAIS
When Things Go South
● What if something doesn’t look right?
○ Stream Processing is stuck
○ Driver/Executor failures
○ Intermittent issues with external dependencies
● Metrics - Spark metrics to Atlas (similar to RRDTool + Graphite)
● Getting paged at 2 am - Not fun :) !
● Need for auto-remediation - less operational overhead

#ML7SAIS
Auto Remediation Infrastructure
SQS Auto Remediation
De-queue
Triggers
Metadata

#ML7SAIS
• Scalability Performance
tuning
– Micro batch interval
– Memory Tuning
– Parallelism/Shuffle
tradeoff
• Data quality issues
– Low latency vs data
quality
Streaming Challenges

Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nitin Sharma and Elliot Chow

More Related Content

What's hot (20)

Similar to Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nitin Sharma and Elliot Chow (19)

More from Databricks (20)

Recently uploaded (20)

Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nitin Sharma and Elliot Chow