SlideShare a Scribd company logo
Analysing high throughput data in Real Time
namit@hotstar.com
$whoamit $whoami
Namit Mahuvakar
Data Engineering
t
Scale @Hotstar
1 million
events ingested per second
● Built on Apache Kafka
● HA and Durable
Ingestion
● Project 50 billion events
per day
14.0 tb
data produced per day
● Data in S3 & HDFS
● Warehouse in hBase
● Project 30 tb of data per
day
8.1 tb
data directly consumed
● Single query interface
● Parity between streaming &
stationary data
● No limit on amount of data scans
Ingestion Patterns Storage Patterns Consumption Patterns
What are we covering?
● Stream processing @Hotstar
● Case Study : Video Delivery Metrics
● Crucial metrics in Video Delivery
● Case Study : Social Signals
● Solving Engagement with Signals
● Where do we fit all this
● Key Takeaways
Video Delivery Metrics
● How do we track surge in video
start up failure ?
● How do we track sudden
increase in video buffering rate ?
● How do you react on these
metrics if they are batch
processed periodically ?
Social Signals
● How do we find online users in
real-time ?
● How do we track online user
notifications in real-time ?
● How to notify users in real time if
the data is batch processed ?
Real-time User Targeting
● How do we target online users
during a live match?
Not on my batch!
● Why? 25.3M concurrent viewers!
● When can Data streams can be processed in a
batch fashion ?
● Leveraging stream processing:
○ P1 (Crucial) metrics for Hotstar
○ Social Signals - User notification
○ Real time user segmentation and Targeting
Case Study :
Video Delivery Metrics
Video Delivery Metrics
Ingest continuous streams of events from 25.3
Mil concurrent users,
● Video Player P1 Metrics -
○ Playback Failure Rate
○ Average Bitrate
○ Buffer time per minute
● Ingest CDN logs to measure last mile
delivery
● P1 Performance Metric
○ Real Time - Multiple
Streams
● Ratio - Failed Video/Started
Video
● What did we use for stream
processing ?
○ Hive Kafka SQL Integration
How did we solve :
Playback Failure Rate
Real Time Analytics Use Case :
How did we use Hive Kafka SQL to get Playback Failure Rate ?
Case Study :
Social Signals
● Indicators or signals sent to the user’s
device
● Increase engagement with Social
Feed
● Discover presence and activities with
their social Graph
● Common Signals -
○ Friend Online
○ Friend Earns Rewards
What are Social Signals ?
● Real time processing of a clickstream
event Started Video
● Helps us approximate presence of
user on stream
● Query friends for Online user via
social graph
● Publish notification to users online
friends
Solving Engagement with Signals
How did we solve Social Signals
Where do we fit all this
Key Takeaways
● Stream Processing is useful for:
○ Monitoring and real time alerting
for quick actionables
○ Quick processing of incoming
streams of data to drive
actionables ex. targeting users
○ Process and find anomalies in
live log or time series data
Check us out at tech.hotstar.com
Blogs @ blog.hotstar.com
Any Questions?

More Related Content

PPTX
GlobalLink NEXT 2017 US - TransPort (Chicago)
PDF
Go Observability (in practice)
PPTX
Nagios Conference 2014 - Paloma Galan - Monitoring Financial Protocols With N...
PDF
Microservices
PDF
Serverless microservices in the wild
PDF
Angular Observables & RxJS Introduction
PPTX
Apache Flink Community Updates November 2016 @ Berlin Meetup
PDF
DevOps Days Austin 2014 - Vendor DEMO
GlobalLink NEXT 2017 US - TransPort (Chicago)
Go Observability (in practice)
Nagios Conference 2014 - Paloma Galan - Monitoring Financial Protocols With N...
Microservices
Serverless microservices in the wild
Angular Observables & RxJS Introduction
Apache Flink Community Updates November 2016 @ Berlin Meetup
DevOps Days Austin 2014 - Vendor DEMO

What's hot (11)

PPTX
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
ODP
Monitoring via Datadog
PDF
Opentracing 101
PDF
Building an Artificial Intelligence mobile application with GeneXus - Angelo ...
PDF
PANIC Project - BRUCon 2012 Presentation
PDF
Kafka Streams
PDF
Spring Cloud Kubernetes
PDF
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
PPTX
Event Hubs and Stream Analytics
PDF
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
PPTX
Large Scale Test Automation
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Monitoring via Datadog
Opentracing 101
Building an Artificial Intelligence mobile application with GeneXus - Angelo ...
PANIC Project - BRUCon 2012 Presentation
Kafka Streams
Spring Cloud Kubernetes
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Event Hubs and Stream Analytics
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Large Scale Test Automation
Ad

Similar to Analysing high throughput data in real time (20)

PDF
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
PDF
A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid
PDF
Extracting Insights from Data at Twitter
PPTX
Real-Time Event & Stream Processing on MS Azure
PDF
Adventures in Observability - Clickhouse and Instana
PDF
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
PDF
[WSO2Con EU 2018] The Rise of Streaming SQL
PDF
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
PDF
#TwitterRealTime - Real time processing @twitter
PPT
The Evolution of Big Data Pipelines at Intuit
PDF
From Data Push to WebSockets
PPTX
Stream processing at Hotstar
PPTX
Kostas Tzoumas - Stream Processing with Apache Flink®
PPTX
Debunking Common Myths in Stream Processing
PPTX
Kochi mulesoft meetup 02
PPTX
WSO2Con USA 2015: Patterns for Deploying Analytics in the Real World
PDF
Easy Microservices with JHipster - Devoxx BE 2017
PDF
Devoxx Belgium 2017 - easy microservices with JHipster
PDF
Understanding Business APIs through statistics
PPTX
Streaming in the Wild with Apache Flink
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid
Extracting Insights from Data at Twitter
Real-Time Event & Stream Processing on MS Azure
Adventures in Observability - Clickhouse and Instana
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
[WSO2Con EU 2018] The Rise of Streaming SQL
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
#TwitterRealTime - Real time processing @twitter
The Evolution of Big Data Pipelines at Intuit
From Data Push to WebSockets
Stream processing at Hotstar
Kostas Tzoumas - Stream Processing with Apache Flink®
Debunking Common Myths in Stream Processing
Kochi mulesoft meetup 02
WSO2Con USA 2015: Patterns for Deploying Analytics in the Real World
Easy Microservices with JHipster - Devoxx BE 2017
Devoxx Belgium 2017 - easy microservices with JHipster
Understanding Business APIs through statistics
Streaming in the Wild with Apache Flink
Ad

More from Hotstar (8)

PPTX
Airflow based Video Encoding Platform
PPTX
How Chaos Engineering is practiced at Hotstar
PPTX
WebSDK - Switching between service providers
PPTX
Remote Config and Beyond
PPTX
Scaling Hotstar.com for 10Mn concurrency
PDF
Build intelligent, real-time applications using Machine Learning
PDF
Build real time stream processing applications using Apache Kafka
PDF
Amazon AI Conclave, Bangalore 2017
Airflow based Video Encoding Platform
How Chaos Engineering is practiced at Hotstar
WebSDK - Switching between service providers
Remote Config and Beyond
Scaling Hotstar.com for 10Mn concurrency
Build intelligent, real-time applications using Machine Learning
Build real time stream processing applications using Apache Kafka
Amazon AI Conclave, Bangalore 2017

Recently uploaded (20)

PPT
Teaching material agriculture food technology
PDF
Machine learning based COVID-19 study performance prediction
PDF
Empathic Computing: Creating Shared Understanding
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Approach and Philosophy of On baking technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Cloud computing and distributed systems.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Encapsulation theory and applications.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
cuic standard and advanced reporting.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
Teaching material agriculture food technology
Machine learning based COVID-19 study performance prediction
Empathic Computing: Creating Shared Understanding
The AUB Centre for AI in Media Proposal.docx
Chapter 3 Spatial Domain Image Processing.pdf
Approach and Philosophy of On baking technology
Digital-Transformation-Roadmap-for-Companies.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
gpt5_lecture_notes_comprehensive_20250812015547.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Cloud computing and distributed systems.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Unlocking AI with Model Context Protocol (MCP)
Encapsulation theory and applications.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
cuic standard and advanced reporting.pdf
Network Security Unit 5.pdf for BCA BBA.
Mobile App Security Testing_ A Comprehensive Guide.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing

Analysing high throughput data in real time

  • 1. Analysing high throughput data in Real Time
  • 3. t Scale @Hotstar 1 million events ingested per second ● Built on Apache Kafka ● HA and Durable Ingestion ● Project 50 billion events per day 14.0 tb data produced per day ● Data in S3 & HDFS ● Warehouse in hBase ● Project 30 tb of data per day 8.1 tb data directly consumed ● Single query interface ● Parity between streaming & stationary data ● No limit on amount of data scans Ingestion Patterns Storage Patterns Consumption Patterns
  • 4. What are we covering? ● Stream processing @Hotstar ● Case Study : Video Delivery Metrics ● Crucial metrics in Video Delivery ● Case Study : Social Signals ● Solving Engagement with Signals ● Where do we fit all this ● Key Takeaways
  • 5. Video Delivery Metrics ● How do we track surge in video start up failure ? ● How do we track sudden increase in video buffering rate ? ● How do you react on these metrics if they are batch processed periodically ?
  • 6. Social Signals ● How do we find online users in real-time ? ● How do we track online user notifications in real-time ? ● How to notify users in real time if the data is batch processed ?
  • 7. Real-time User Targeting ● How do we target online users during a live match?
  • 8. Not on my batch! ● Why? 25.3M concurrent viewers! ● When can Data streams can be processed in a batch fashion ? ● Leveraging stream processing: ○ P1 (Crucial) metrics for Hotstar ○ Social Signals - User notification ○ Real time user segmentation and Targeting
  • 9. Case Study : Video Delivery Metrics
  • 10. Video Delivery Metrics Ingest continuous streams of events from 25.3 Mil concurrent users, ● Video Player P1 Metrics - ○ Playback Failure Rate ○ Average Bitrate ○ Buffer time per minute ● Ingest CDN logs to measure last mile delivery
  • 11. ● P1 Performance Metric ○ Real Time - Multiple Streams ● Ratio - Failed Video/Started Video ● What did we use for stream processing ? ○ Hive Kafka SQL Integration How did we solve : Playback Failure Rate
  • 12. Real Time Analytics Use Case : How did we use Hive Kafka SQL to get Playback Failure Rate ?
  • 14. ● Indicators or signals sent to the user’s device ● Increase engagement with Social Feed ● Discover presence and activities with their social Graph ● Common Signals - ○ Friend Online ○ Friend Earns Rewards What are Social Signals ?
  • 15. ● Real time processing of a clickstream event Started Video ● Helps us approximate presence of user on stream ● Query friends for Online user via social graph ● Publish notification to users online friends Solving Engagement with Signals
  • 16. How did we solve Social Signals
  • 17. Where do we fit all this
  • 18. Key Takeaways ● Stream Processing is useful for: ○ Monitoring and real time alerting for quick actionables ○ Quick processing of incoming streams of data to drive actionables ex. targeting users ○ Process and find anomalies in live log or time series data
  • 19. Check us out at tech.hotstar.com Blogs @ blog.hotstar.com

Editor's Notes

  • #2: Welcome to the talk. I will be talking about Data Platform Patterns for Scale. I will be sharing my learnings based on building the platform for Hotstar scale
  • #19: Untangle movement of data Single source of truth No duplicate writes Anyone can consume anything Decouples data generation from data computation Process, transform and react on the data as it happens Sub-second latencies Anomaly detection on bad stream quality Timely notification to users who dropped off in a live match
  • #20: Newer technologies are added
  • #21: Newer technologies are added