SlideShare a Scribd company logo
Scalable, real-time Machine Learning using
Apache Kafka
Agenda
● Traditional model deployment process
● 90 seconds to WoW
● Let’s process the incoming stream
● Demo
● What’s more?
2
$ whoami
● Personalisation lead at Hotstar
● Led Data Infrastructure team at Grofers and TinyOwl
● Kafka fanboy
● Usually rant on twitter @jayeshsidhwani
3
Machine Learning @ Hotstar
● ~150 mn users
● 4.8 mn peak concurrency
● 120K peak recommendation requests per
second
● Diverse content in diverse languages
4
Traditional model deployment process
5
Model
Training
Data Lake
Serialized
Model
Batch
Predictions
Recommendation
APIs
Offline Online
● One-day /
few-hours batch
pre-compute
● Slow time to
react
Sense of urgency?
6
● 90 seconds to convert a new user
● To power his experience, we need to know
user’s gender, interests and more
● Need an always-thinking machine
Thinking streams
7
Data at Rest Data in motion
● Slow
● Batch-y
● Fast
● Sub-second
Enter Apache Kafka
8
● Kafka is a scalable,
fault-tolerant, distributed message
queue
● Producers and Consumers
● Uses
○ Real-time applications such as:
intelligent notifications, anomaly etc.
○ Asynchronous communication in
event-driven architectures
Diagram credits: http://kafka.apache.org
Real-time infrastructure at Hotstar
9
● All clickstream data pushed
into Apache Kafka
● Apache Kafka Streams to
process events as they happen
● Incoming data available for
everyone
Intelligence
Apple
TV
iOS ANDROID Roku
STREAM PROCESSING FRAMEWORK
Filter
Window
Join
Anomaly
Machine
Learning
Demo
Predict whether a flight is delayed in real-time
10
How to process a stream?
11
ML
Advanced use-cases
12
page-clicksProcessor nodes
Source / Sink nodes
video-plays
predict-gender
predict-interest 5-min trending
videos
Recommended
for You
Hotstar Streaming Platform
Questions?
13
tech.hotstar.com

More Related Content

PDF
Build real time stream processing applications using Apache Kafka
PDF
Stream processing using Apache Storm - Big Data Meetup Athens 2016
PDF
Scylla Summit 2022: Overcoming the Performance Cost of Streaming Transactions
PDF
RealTime Recommendations @Netflix - Spark
PDF
SAIS2018 - Fact Store At Netflix Scale
PDF
Log Management: AtlSecCon2015
PDF
Building end to end streaming application on Spark
PDF
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
Build real time stream processing applications using Apache Kafka
Stream processing using Apache Storm - Big Data Meetup Athens 2016
Scylla Summit 2022: Overcoming the Performance Cost of Streaming Transactions
RealTime Recommendations @Netflix - Spark
SAIS2018 - Fact Store At Netflix Scale
Log Management: AtlSecCon2015
Building end to end streaming application on Spark
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...

What's hot (20)

PDF
OSMC 2014: Time to say goodbye to your Nagios setup | Oliver Jan
PDF
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PDF
Introduction to Structured Data Processing with Spark SQL
PDF
Hw09 Next Steps For Hadoop
PDF
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
PDF
Interactive Data Analysis in Spark Streaming
PPTX
Streaming datasets for personalization
PDF
How to build an event driven architecture with kafka and kafka connect
PDF
Improving Mobile Payments With Real time Spark
PDF
Storing State Forever: Why It Can Be Good For Your Analytics
PDF
What Your Tech Lead Thinks You Know (But Didn't Teach You)
PDF
How We Migrate PBs Data from Beijing to Shanghai
PDF
Scaling a Core Banking Engine Using Apache Kafka | Peter Dudbridge, Thought M...
PPTX
Kafka Practices @ Uber - Seattle Apache Kafka meetup
PPTX
Kafka Retry and DLQ
PPTX
Air traffic controller - Streams Processing meetup
PDF
Stream Processing with Kafka and Samza
PDF
Exploratory Data Analysis in Spark
PDF
Structured Streaming with Kafka
PDF
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
OSMC 2014: Time to say goodbye to your Nagios setup | Oliver Jan
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
Introduction to Structured Data Processing with Spark SQL
Hw09 Next Steps For Hadoop
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Interactive Data Analysis in Spark Streaming
Streaming datasets for personalization
How to build an event driven architecture with kafka and kafka connect
Improving Mobile Payments With Real time Spark
Storing State Forever: Why It Can Be Good For Your Analytics
What Your Tech Lead Thinks You Know (But Didn't Teach You)
How We Migrate PBs Data from Beijing to Shanghai
Scaling a Core Banking Engine Using Apache Kafka | Peter Dudbridge, Thought M...
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Retry and DLQ
Air traffic controller - Streams Processing meetup
Stream Processing with Kafka and Samza
Exploratory Data Analysis in Spark
Structured Streaming with Kafka
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Ad

Similar to Build intelligent, real-time applications using Machine Learning (20)

PDF
Apache Kafka Streams + Machine Learning / Deep Learning
PDF
Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...
PDF
Kai Wähner, Technology Evangelist at Confluent: "Development of Scalable Mac...
PDF
Unleashing Apache Kafka and TensorFlow in Hybrid Cloud Architectures
PDF
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
PDF
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
PDF
Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...
PDF
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
PDF
Apache Kafka Open Source Ecosystem for Machine Learning at Extreme Scale (Apa...
PDF
Apache Kafka - Scalable Message-Processing and more !
PDF
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
PDF
How to Leverage the Apache Kafka Ecosystem to Productionize Machine Learning ...
PDF
Kai Waehner - Deep Learning at Extreme Scale in the Cloud with Apache Kafka a...
PDF
Machine Learning Trends of 2018 combined with the Apache Kafka Ecosystem
PDF
Apache kafka
PPTX
Apache kafka
PPTX
Apache kafka
PDF
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
PDF
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
PDF
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka Streams + Machine Learning / Deep Learning
Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...
Kai Wähner, Technology Evangelist at Confluent: "Development of Scalable Mac...
Unleashing Apache Kafka and TensorFlow in Hybrid Cloud Architectures
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
Apache Kafka Open Source Ecosystem for Machine Learning at Extreme Scale (Apa...
Apache Kafka - Scalable Message-Processing and more !
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
How to Leverage the Apache Kafka Ecosystem to Productionize Machine Learning ...
Kai Waehner - Deep Learning at Extreme Scale in the Cloud with Apache Kafka a...
Machine Learning Trends of 2018 combined with the Apache Kafka Ecosystem
Apache kafka
Apache kafka
Apache kafka
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Ad

More from Hotstar (7)

PPTX
Airflow based Video Encoding Platform
PPTX
How Chaos Engineering is practiced at Hotstar
PPTX
WebSDK - Switching between service providers
PPTX
Remote Config and Beyond
PPTX
Analysing high throughput data in real time
PPTX
Scaling Hotstar.com for 10Mn concurrency
PDF
Amazon AI Conclave, Bangalore 2017
Airflow based Video Encoding Platform
How Chaos Engineering is practiced at Hotstar
WebSDK - Switching between service providers
Remote Config and Beyond
Analysing high throughput data in real time
Scaling Hotstar.com for 10Mn concurrency
Amazon AI Conclave, Bangalore 2017

Recently uploaded (20)

PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
cuic standard and advanced reporting.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Machine Learning_overview_presentation.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
MYSQL Presentation for SQL database connectivity
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Cloud computing and distributed systems.
PPT
Teaching material agriculture food technology
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
cuic standard and advanced reporting.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
NewMind AI Weekly Chronicles - August'25-Week II
Network Security Unit 5.pdf for BCA BBA.
Machine Learning_overview_presentation.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Electronic commerce courselecture one. Pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Review of recent advances in non-invasive hemoglobin estimation
Mobile App Security Testing_ A Comprehensive Guide.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
sap open course for s4hana steps from ECC to s4
MYSQL Presentation for SQL database connectivity
The AUB Centre for AI in Media Proposal.docx
Cloud computing and distributed systems.
Teaching material agriculture food technology
gpt5_lecture_notes_comprehensive_20250812015547.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

Build intelligent, real-time applications using Machine Learning

  • 1. Scalable, real-time Machine Learning using Apache Kafka
  • 2. Agenda ● Traditional model deployment process ● 90 seconds to WoW ● Let’s process the incoming stream ● Demo ● What’s more? 2
  • 3. $ whoami ● Personalisation lead at Hotstar ● Led Data Infrastructure team at Grofers and TinyOwl ● Kafka fanboy ● Usually rant on twitter @jayeshsidhwani 3
  • 4. Machine Learning @ Hotstar ● ~150 mn users ● 4.8 mn peak concurrency ● 120K peak recommendation requests per second ● Diverse content in diverse languages 4
  • 5. Traditional model deployment process 5 Model Training Data Lake Serialized Model Batch Predictions Recommendation APIs Offline Online ● One-day / few-hours batch pre-compute ● Slow time to react
  • 6. Sense of urgency? 6 ● 90 seconds to convert a new user ● To power his experience, we need to know user’s gender, interests and more ● Need an always-thinking machine
  • 7. Thinking streams 7 Data at Rest Data in motion ● Slow ● Batch-y ● Fast ● Sub-second
  • 8. Enter Apache Kafka 8 ● Kafka is a scalable, fault-tolerant, distributed message queue ● Producers and Consumers ● Uses ○ Real-time applications such as: intelligent notifications, anomaly etc. ○ Asynchronous communication in event-driven architectures Diagram credits: http://kafka.apache.org
  • 9. Real-time infrastructure at Hotstar 9 ● All clickstream data pushed into Apache Kafka ● Apache Kafka Streams to process events as they happen ● Incoming data available for everyone Intelligence Apple TV iOS ANDROID Roku STREAM PROCESSING FRAMEWORK Filter Window Join Anomaly Machine Learning
  • 10. Demo Predict whether a flight is delayed in real-time 10
  • 11. How to process a stream? 11 ML
  • 12. Advanced use-cases 12 page-clicksProcessor nodes Source / Sink nodes video-plays predict-gender predict-interest 5-min trending videos Recommended for You Hotstar Streaming Platform