The Netflix way to deal
with real-time data
How we built a 1t/day stream processing cloud platform in a year
What should I expect
Keystone Season 1 - Who, What, How and Why
Keystone Season 2 - Preview Trailer
1.
The Who?
hello!
I am Peter Bakas
I lead the Real-Time Data Infrastructure team @ Netflix
You can find me at @peter_bakas
2.
The What?
Publish +
Collect +
Process +
Move Data
1,000,000,000,000
Whoa! That’s a big number.
Events Processed Every Day
Daily Averages
700B unique events ingested
1T events processed
1.4 PB
By the numbers
Peak
1T unique events ingested/day
12.5M/sec
35 GB/sec
Trending on Netflix
80B/day
1/2014
300B/day
1/2015
1T/day
1/2016
Growth has
its season
In the beginning - Chukwa
Q4 2014 - Chukwa/Suro
Q4 2015 - Keystone
Internal
Routing
Service
EMR
Fronting
Kafka
Event
Producer
Consumer
Kafka
Control Plane
HTTP
PROXY
Stream
Consumers
Kafka
Q4 2015 - Keystone
Internal
Routing
Service
EMR
Fronting
Kafka
Event
Producer
Consumer
Kafka
Control Plane
HTTP
PROXY
Stream
Consumers
Keystone Kafka Footprint
Fronting
Kafka
Consumer Kafka
Number of Clusters 24 8
Number of Instances 3000+ 900+
Retention Period 8 to 24 hrs 2 to 4 hrs
Q4 2015 - Keystone
Internal
Routing
Service
EMR
Fronting
Kafka
Event
Producer
Consumer
Kafka
Control Plane
HTTP
PROXY
Stream
Consumers
Keystone Internal Routing Service
+
Checkpointing
Cluster
+
Keystone Internal Routing Service
Keystone Internal Routing Service Footprint
S3 ElasticSearch
Consumer
Kafka
Number of
containers
7000 1500 4500
Keystone avg end-to-end metrics
S3 ElasticSearch
Consumer
Kafka
1 sec 13 sec 800 ms
3.
The How?
Netflix Culture
Freedom and Responsibility
“It may well be the most
important document to ever
come out of the Valley
Sheryl Sandberg, COO @ Facebook
What does
culture
have to do
with how?
Sounds easy
Build
Team
Build
Product
A true story
Keystone went live 10/27/15
2 days later...
Place your screenshot here
80% “loss” over
6 hour period
A true story
Lessons learned
There are times when things can go wrong… and no turning back
Reduce complexity
Minimize blast radius
Find a way to start over fresh
Failover
Cold standby Kafka cluster with different instance type
Different ZooKeeper cluster with no state
Fully automated
Place your screenshot here
Time is of the essence
Failover as fast as
5 minutes
Fully
Automated
Failover
Best Practices
Full Automation
Self Healing
Kafka Kong
4.
The Why?
“We didn’t do anything wrong, but
somehow, we lost...
Stephen Elop, Nokia CEO
Global Launch 1/6/16
125,000,000 hrs/day
That’s a lot of hours!
37 %
of North America internet traffic @ peak!
81,000,000 members
and a lot of members
If you don’t change,
you will be eliminated from the
competition
5.
Coming Soon
Our philosophy
Create Duplo R
Blocks :
Let reusability drive new value
Evolution
Keystone
Management
Keystone
Messaging
Keystone
Stream
Processing
Keystone
Unified event publishing, collection,
routing for batch and stream
processing
85% of data volume
Keystone Messaging
Ad-hoc Messaging
15% of data volume
Consumers weary of
Complexity of self-managed
infrastructure
Multiple runtimes across different
platforms
Keystone Stream Processing
Consumers want
Simple unified model/API/UI/system
Simple and intuitive interface to
manage all Keystone services
Keystone Management
thanks!
Any questions?
You can find me at
@peter_bakas
pbakas@netflix.com
Credits
Special thanks to all the people who made and released
these awesome resources for free:
Presentation template by SlidesCarnival
Photographs by Unsplash

More Related Content

PDF
Keystone - ApacheCon 2016
PDF
Netflix Keystone—Cloud scale event processing pipeline
PDF
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
PDF
The Netflix Way to deal with Big Data Problems
PDF
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
PPTX
Netflix Data Pipeline With Kafka
PDF
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
PDF
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Keystone - ApacheCon 2016
Netflix Keystone—Cloud scale event processing pipeline
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
The Netflix Way to deal with Big Data Problems
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
Netflix Data Pipeline With Kafka
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Netflix Keystone Pipeline at Samza Meetup 10-13-2015

What's hot (16)

PPTX
Distributed architecture in a cloud native microservices ecosystem
PPTX
Netflix viewing data architecture evolution - EBJUG Nov 2014
PDF
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
PDF
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
PDF
Engineering Leader opportunity @ Netflix - Playback Data Systems
PDF
#lspe Q1 2013 dynamically scaling netflix in the cloud
PPTX
Operational challenges behind Serverless architectures
PDF
netflix-real-time-data-strata-talk
PDF
Using Apache Kafka to Analyze Session Windows
PPTX
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
PPTX
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
PDF
Unbounded bounded-data-strangeloop-2016-monal-daxini
PDF
Event Stream Processing with Kafka and Samza
PDF
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
PDF
Netflix at-disney-09-26-2014
PPTX
Netflix incloudsmarch8 2011forwiki
Distributed architecture in a cloud native microservices ecosystem
Netflix viewing data architecture evolution - EBJUG Nov 2014
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Engineering Leader opportunity @ Netflix - Playback Data Systems
#lspe Q1 2013 dynamically scaling netflix in the cloud
Operational challenges behind Serverless architectures
netflix-real-time-data-strata-talk
Using Apache Kafka to Analyze Session Windows
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Unbounded bounded-data-strangeloop-2016-monal-daxini
Event Stream Processing with Kafka and Samza
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
Netflix at-disney-09-26-2014
Netflix incloudsmarch8 2011forwiki
Ad

Viewers also liked (20)

PDF
BDX 2016- Monal daxini @ Netflix
PPTX
Hdfs 2016-hadoop-summit-dublin-v1
PPTX
Como utilizar las redes sociales
PPTX
Matt Franklin - Apache Software (Geekfest)
PPTX
Hadoop Distributed File System
PDF
Rootconf
ODP
From Config Management Sucks to #cfgmgmtlove
PDF
Mesoscon 2015
PPTX
Apache hadoop technology : Beginners
PDF
How To Make Dev Ops Work @ Netlight Edge X Berlin
PDF
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
PDF
Path to continuous delivery
PDF
906702 Enhancing Business Processes Using Enterprise Information Systems
DOC
11. Huccet I Imaniye
PDF
Building Product from ground up using Open Source Technologies
PDF
IT_FOR_BUSINESS_30NOV15
PDF
Data science team, a practice to setup
PPTX
Send that (damn) elevator down !
ODP
PPTX
The Rise of the Container: The Dev/Ops Technology That Accelerates Ops/Dev
BDX 2016- Monal daxini @ Netflix
Hdfs 2016-hadoop-summit-dublin-v1
Como utilizar las redes sociales
Matt Franklin - Apache Software (Geekfest)
Hadoop Distributed File System
Rootconf
From Config Management Sucks to #cfgmgmtlove
Mesoscon 2015
Apache hadoop technology : Beginners
How To Make Dev Ops Work @ Netlight Edge X Berlin
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
Path to continuous delivery
906702 Enhancing Business Processes Using Enterprise Information Systems
11. Huccet I Imaniye
Building Product from ground up using Open Source Technologies
IT_FOR_BUSINESS_30NOV15
Data science team, a practice to setup
Send that (damn) elevator down !
The Rise of the Container: The Dev/Ops Technology That Accelerates Ops/Dev
Ad

Similar to Keystone - Leverage Big Data 2016 (20)

PDF
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
PPTX
Keystone event processing pipeline on a dockerized microservices architecture
PPTX
Running a Massively Parallel Self-serve Distributed Data System At Scale
PDF
Monal Daxini - Beaming Flink to the Cloud @ Netflix
PDF
Self-hosting Kafka at Scale: Netflix's Journey & Challenges
PDF
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
PDF
Kafka Summit SF 2017 - Real time Streaming Platform
PPTX
Apache Kafka at LinkedIn
PDF
20140708 - Jeremy Edberg: How Netflix Delivers Software
PPTX
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
PDF
Netflix's Could Migration
PDF
Migrating a build farm from on-prem to AWS
PDF
Flink forward-2017-netflix keystones-paas
PPTX
Netflix Data Pipeline With Kafka
PDF
101 mistakes FINN.no has made with Kafka (Baksida meetup)
PPTX
Building Stream Processing as a Service
PPTX
Netflix_AWS_Case_Study_Presentation (1).pptx
PDF
Kubernetes in a grown environment and integration into continuous delivery
PDF
Fabio Tiriticco - Ádám Sándor - Akka Cluster versus Kubernetes: Clustering...
PPTX
ELK at LinkedIn - Kafka, scaling, lessons learned
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Keystone event processing pipeline on a dockerized microservices architecture
Running a Massively Parallel Self-serve Distributed Data System At Scale
Monal Daxini - Beaming Flink to the Cloud @ Netflix
Self-hosting Kafka at Scale: Netflix's Journey & Challenges
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Kafka Summit SF 2017 - Real time Streaming Platform
Apache Kafka at LinkedIn
20140708 - Jeremy Edberg: How Netflix Delivers Software
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Netflix's Could Migration
Migrating a build farm from on-prem to AWS
Flink forward-2017-netflix keystones-paas
Netflix Data Pipeline With Kafka
101 mistakes FINN.no has made with Kafka (Baksida meetup)
Building Stream Processing as a Service
Netflix_AWS_Case_Study_Presentation (1).pptx
Kubernetes in a grown environment and integration into continuous delivery
Fabio Tiriticco - Ádám Sándor - Akka Cluster versus Kubernetes: Clustering...
ELK at LinkedIn - Kafka, scaling, lessons learned

Recently uploaded (20)

PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
CloudStack 4.21: First Look Webinar slides
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
Architecture types and enterprise applications.pdf
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PPTX
TEXTILE technology diploma scope and career opportunities
PDF
Developing a website for English-speaking practice to English as a foreign la...
PPTX
Build Your First AI Agent with UiPath.pptx
PPT
What is a Computer? Input Devices /output devices
PPTX
Microsoft Excel 365/2024 Beginner's training
PPTX
Configure Apache Mutual Authentication
PPTX
The various Industrial Revolutions .pptx
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
UiPath Agentic Automation session 1: RPA to Agents
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
Taming the Chaos: How to Turn Unstructured Data into Decisions
The influence of sentiment analysis in enhancing early warning system model f...
CloudStack 4.21: First Look Webinar slides
Convolutional neural network based encoder-decoder for efficient real-time ob...
Architecture types and enterprise applications.pdf
NewMind AI Weekly Chronicles – August ’25 Week III
TEXTILE technology diploma scope and career opportunities
Developing a website for English-speaking practice to English as a foreign la...
Build Your First AI Agent with UiPath.pptx
What is a Computer? Input Devices /output devices
Microsoft Excel 365/2024 Beginner's training
Configure Apache Mutual Authentication
The various Industrial Revolutions .pptx
OpenACC and Open Hackathons Monthly Highlights July 2025
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
Improvisation in detection of pomegranate leaf disease using transfer learni...
Getting started with AI Agents and Multi-Agent Systems
UiPath Agentic Automation session 1: RPA to Agents
Module 1.ppt Iot fundamentals and Architecture
Enhancing plagiarism detection using data pre-processing and machine learning...

Keystone - Leverage Big Data 2016