SlideShare a Scribd company logo
#ML7SAIS
Elliot Chow, Netflix
Nitin Sharma, Netflix
Near Real-Time Recommendations - Spark
Streaming
#ML7SAIS
#ML7SAIS
Agenda
● Recommendations @ Netflix
● The Need for Near Real Time
● Use Cases
● Common Infrastructure
● Scaling Challenges
#ML7SAIS
Recommendations at Netflix
● Personalize the Netflix experience for each member
○ Goal: Quickly help members find content they’d like to
watch
○ Risk: Member may lose interest and abandon the service
○ Challenge: Recommending at scale
#ML7SAIS
Scale @ Netflix
● 125M+ active members
● 190 countries
● 450B+ unique events/day
● 700+ Kafka topics
#ML7SAIS
● Data stored in Hive/S3
● Batch ETLs using Spark/Hive
● Table partitioning by day or hour
● Job scheduling by both CRON or data availability
● Latency often is on the order of days
Typical Data Pipelines @ Netflix
#ML7SAIS
● Dynamic catalog
● Growing member base
● Time sensitivity
○ Content popularity changes
○ Member interests evolve
The Need for Near Real Time (NRT)
#ML7SAIS
● Increasing amount of data
○ Process data as soon as possible to keep latencies low
○ Minimize amount of data to reprocess in case of failure
● Multi-Armed Bandits Adoption
The Need for Near Real Time (NRT)
#ML7SAIS
Use Cases
● Video Insights
● ML Pipelines for Recommendations
#ML7SAIS
NRT for Video
Insights
#ML7SAIS
Video Insights
● New title launches
● Early reads on title performance
● Slice by various dimensions
#ML7SAIS
Video Insights
Service
#ML7SAIS
Video Insights
Service
#ML7SAIS
Video Insights
Service
#ML7SAIS
Video Insights
Service
#ML7SAIS
Video Insights
Service
Client
#ML7SAIS
Video Insights - State
● Counts maintained in Spark
● Custom state management
○ Based on mapWithState implementation
○ Easier to re-use the function f in batch mode
○ Lower-level control over state management
○ scanByKey alternative for keyed state
input.scan(initRDD)((currentRDD, batchRDD) => f(currentRDD, batchRDD))
#ML7SAIS
NRT for Recommendations
#ML7SAIS
Billboard Recommendations
● Recommend a relevant title to each
member
● Right time
● Respond quickly to member feedback
#ML7SAIS
Artwork Personalization
19
● Personalized Image
● Visual Evidence
● Quickly adapting - Title
launches, member tastes
● Rapid learning - Cold start
#ML7SAIS
Traditional Recommendations
Millions of
Play related
Signals
Training
pipelines
Models
Precompute/Live
Compute
Rankings
DataStore/
Online
caches
ETL
Layers
Few days
#ML7SAIS
NRT Recommendations
Millions of
Play related
Signals
Models
Precompute/Live
Compute
Rankings
DataStore/Online
caches
Training
pipelines
Streaming
layer
NRT
#ML7SAIS
Required Data
● Impressions, Plays, etc.
● Attribution
● Explore/Exploit Metadata
#ML7SAIS
Attribution Infrastructure
Stream Processing Batch Processing
Query API
#ML7SAIS
Stream Processing - Zoomed in
Impression
Cache
Video
Metadata
Play
Enrich
#ML7SAIS
Batch Processing - Zoomed in
Impression
Play
Experimentation
Data
Windowed
Impression
Windowed
Play
Dedupe
Join
Video
Metadata
#ML7SAIS
Common Infrastructure
#ML7SAIS
Netflix Spark Stack
● Jenkins
● Spinnaker
● ASG
● Runners: Marathon, Meson
● Resource Manager: Mesos
● Storage: HDFS, S3, EFS
● Multi-Region
#ML7SAIS
Multi Region Challenges
● Geo routing
● Impression in one region;
play in another
● Streaming - Multi Region
● Batch Reduce/Merge - One
region
#ML7SAIS
Can it withstand Chaos?
• Chaos is a design principle
• Instance Failovers => Region Failovers
• Transparent to the consumers
• Over provisioned
• Long term - Autoscale
#ML7SAIS
When Things Go South
● What if something doesn’t look right?
○ Stream Processing is stuck
○ Driver/Executor failures
○ Intermittent issues with external dependencies
● Metrics - Spark metrics to Atlas (similar to RRDTool + Graphite)
● Getting paged at 2 am - Not fun :) !
● Need for auto-remediation - less operational overhead
#ML7SAIS
Auto Remediation Infrastructure
SQS Auto Remediation
De-queue
Triggers
Metadata
#ML7SAIS
• Scalability Performance
tuning
– Micro batch interval
– Memory Tuning
– Parallelism/Shuffle
tradeoff
• Data quality issues
– Low latency vs data
quality
Streaming Challenges

More Related Content

PPTX
Grafana.pptx
PPTX
Elastic stack Presentation
PPT
Hadoop Security Architecture
PPTX
AWS IAM and security
PDF
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
PDF
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
PPTX
Elastic Stack Introduction
PDF
Batch Processing at Scale with Flink & Iceberg
Grafana.pptx
Elastic stack Presentation
Hadoop Security Architecture
AWS IAM and security
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
Elastic Stack Introduction
Batch Processing at Scale with Flink & Iceberg

What's hot (20)

PDF
Apache Flink Stream Processing
PDF
The automation challenge: Kubernetes Operators vs Helm Charts
PDF
Hyperspace for Delta Lake
PDF
Migrating to Apache Spark at Netflix
PDF
A Thorough Comparison of Delta Lake, Iceberg and Hudi
PDF
Deploying Confluent Platform for Production
ODP
Monitoring With Prometheus
PPTX
Evening out the uneven: dealing with skew in Flink
PDF
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
PPTX
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
PDF
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
PDF
민첩하고 비용효율적인 Data Lake 구축 - 문종민 솔루션즈 아키텍트, AWS
PPTX
Amazon Timestream 시계열 데이터 전용 DB 소개 :: 변규현 - AWS Community Day 2019
PDF
When NOT to use Apache Kafka?
PPTX
Kafka 101
PPTX
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
PDF
천만 사용자를 위한 AWS 클라우드 아키텍처 진화하기 - 김준형 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
PPTX
SAP hybris Caching and Monitoring
PDF
Cloud DW technology trends and considerations for enterprises to apply snowflake
PDF
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Apache Flink Stream Processing
The automation challenge: Kubernetes Operators vs Helm Charts
Hyperspace for Delta Lake
Migrating to Apache Spark at Netflix
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Deploying Confluent Platform for Production
Monitoring With Prometheus
Evening out the uneven: dealing with skew in Flink
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
민첩하고 비용효율적인 Data Lake 구축 - 문종민 솔루션즈 아키텍트, AWS
Amazon Timestream 시계열 데이터 전용 DB 소개 :: 변규현 - AWS Community Day 2019
When NOT to use Apache Kafka?
Kafka 101
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
천만 사용자를 위한 AWS 클라우드 아키텍처 진화하기 - 김준형 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
SAP hybris Caching and Monitoring
Cloud DW technology trends and considerations for enterprises to apply snowflake
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Ad

Similar to Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nitin Sharma and Elliot Chow (19)

PDF
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
PDF
Fact Store at Scale for Netflix Recommendations
PDF
Netflix factstore for recommendations - 2018
PDF
SAIS2018 - Fact Store At Netflix Scale
PDF
Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar S...
PDF
Netflix Recommendations - Fact Store
PDF
Introduction to near real time computing
PPTX
Recommendation at Netflix Scale
PDF
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
PPTX
Unlocking insights in streaming data
PPTX
Data streaming fundamentals
PPTX
3e recommendation engines_meetup
PDF
REAL-TIME RECOMMENDATION SYSTEMS
PDF
Streaming Visualization
PPTX
Real-Time Real-Talk: Real-World Applications of Streaming Data [Webinar]
PPTX
Building an recommendation system for IPTV on a fast streaming architecture -...
PDF
How Netflix Movies Datasets Cover Both Movies and TV Shows for Deeper Insight...
PDF
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
PPTX
Arc new PDF.pptxHow Netflix Movies Datasets Provide Insights Across Movies & ...
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Fact Store at Scale for Netflix Recommendations
Netflix factstore for recommendations - 2018
SAIS2018 - Fact Store At Netflix Scale
Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar S...
Netflix Recommendations - Fact Store
Introduction to near real time computing
Recommendation at Netflix Scale
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Unlocking insights in streaming data
Data streaming fundamentals
3e recommendation engines_meetup
REAL-TIME RECOMMENDATION SYSTEMS
Streaming Visualization
Real-Time Real-Talk: Real-World Applications of Streaming Data [Webinar]
Building an recommendation system for IPTV on a fast streaming architecture -...
How Netflix Movies Datasets Cover Both Movies and TV Shows for Deeper Insight...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Arc new PDF.pptxHow Netflix Movies Datasets Provide Insights Across Movies & ...
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake

Recently uploaded (20)

PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPT
Quality review (1)_presentation of this 21
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Introduction to machine learning and Linear Models
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Foundation of Data Science unit number two notes
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Database Infoormation System (DBIS).pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
annual-report-2024-2025 original latest.
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPT
Miokarditis (Inflamasi pada Otot Jantung)
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Quality review (1)_presentation of this 21
IB Computer Science - Internal Assessment.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Introduction to machine learning and Linear Models
oil_refinery_comprehensive_20250804084928 (1).pptx
Business Analytics and business intelligence.pdf
IBA_Chapter_11_Slides_Final_Accessible.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Foundation of Data Science unit number two notes
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Database Infoormation System (DBIS).pptx
Reliability_Chapter_ presentation 1221.5784
annual-report-2024-2025 original latest.
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Miokarditis (Inflamasi pada Otot Jantung)

Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nitin Sharma and Elliot Chow