SlideShare a Scribd company logo
Now Playing on Netflix:
Adventures in a Cloudy Future
CMG November 2013
Adrian Cockcroft
@adrianco @NetflixOSS
http://www.linkedin.com/in/adriancockcroft
Netflix Member Web Site Home Page
Personalization Driven – How Does It Work?
How Netflix Used to Work
Consumer
Electronics

Oracle
Monolithic Web
App

AWS Cloud
Services

MySQL

CDN Edge
Locations

Oracle
Datacenter

Customer Device
(PC, PS3, TV…)

Monolithic
Streaming App
MySQL

Content
Management
Limelight/Level 3
Akamai CDNs
Content Encoding
How Netflix Streaming Works Today
Consumer
Electronics

User Data
Web Site or
Discovery API

AWS Cloud
Services

Personalization

CDN Edge
Locations

DRM
Datacenter

Customer Device
(PC, PS3, TV…)

Streaming API
QoS Logging

OpenConnect
CDN Boxes

CDN
Management
and Steering
Content Encoding
Nov
2012
Streaming
Bandwidth

March
2013
Mean
Bandwidth
+39% 6mo
Netflix Scale
• Tens of thousands of instances on AWS
– Typically 4 core, 30GByte, Java business logic
– Thousands created/removed every day

• Thousands of Cassandra NoSQL storage nodes
– Mostly 8 core, 60Gbyte, 2TByte of SSD
– 65 different clusters, over 300TB data, triple zone
– Over 40 are multi-region clusters (6, 9 or 12 zone)
– Biggest 288 nodes, 300K rps, 1.3M wps
Reactions over time
2009 “You guys are crazy! Can’t believe it”
2010 “What Netflix is doing won’t work”

2011 “It only works for ‘Unicorns’ like Netflix”
2012 “We’d like to do that but can’t”
2013 “We’re on our way using Netflix OSS code”
"This is the IT swamp draining manual for anyone who is neck deep in alligators." Adrian Cockcroft, Cloud Architect at Netflix
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is doing Continuous Delivery
Web-scale

Cloud
Commodity

ClientServer

Mainframe
Goal of Traditional IT:
Reliable hardware
running stable software
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is doing Continuous Delivery
SCALE
Breaks hardware
….SPEED
Breaks software
SPEED at
SCALE
Breaks everything
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is doing Continuous Delivery
Incidents – Impact and Mitigation
Public Relations
Media Impact

PR

Y incidents mitigated by Active
Active, game day practicing

X Incidents
High Customer
Service Calls

CS

YY incidents
mitigated by
better tools and
practices

XX Incidents
Affects AB
Test Results

Metrics impact – Feature disable
XXX Incidents
No Impact – fast retry or automated failover
XXXX Incidents

YYY incidents
mitigated by better
data tagging
Web Scale Architecture
AWS
Route53

DynECT
DNS

UltraDNS

DNS
Automation

Regional Load Balancers

Regional Load Balancers

Zone A

Zone B

Zone C

Zone A

Zone B

Zone C

Cassandra Replicas

Cassandra Replicas

Cassandra Replicas

Cassandra Replicas

Cassandra Replicas

Cassandra Replicas
CIO Says Speed IT Up!
“Get inside your adversaries'
OODA loop to disorient them”
Colonel Boyd, USAF
Land grab
opportunity

Engage
customers

Deliver

Measure
Customers

Act

Competitive
Move

Observe

Colonel Boyd,
USAF
“Get inside your
adversaries'
OODA loop to
disorient them”

Customer
Pain Point

Analysis

Orient
Model
Hypotheses

Implement

Decide
Commit
Resources

Plan
Response
Get Buy-in
Territory
Expansion

Print Ad
Campaign
Upgrade
Mainframe

Measure
Revenue

Act

Foreign
Competition

Observe

Mainframe
Era - 1 year
cycle

Customer
Pain Point

Systems
Analysis

Orient
Capacity
Model

Customize
Vendor SW

Decide
Vendor
Evaluation

5 year Plan
Board
Level Buyin
80’s Mainframe Innovation Cycle
•
•
•
•
•

Cost $1M to $100M
Duration 1 to 5 years
Bet the whole company
Cost of failure – bankrupt or bought
Cobol and DB2 on MVS
Territory
Expansion

TV Advert
Campaign
Install
Servers

Measure
Revenue

Act

Foreign
Competition

Observe

Client/Server
Era – 3
month cycle

Customer
Pain Point

Data
Warehouse

Orient
Capacity
Estimate

Customize
Vendor SW

Decide
Vendor
Evaluation

1 year Plan
CIO Level
Buy-in
90’s Client Server Innovation Cycle
•
•
•
•
•

Cost $100K to $10M
Duration 3 – 12 months
Bet a product line or division
Cost of failure – revenue hit, CIO’s job
C++ and Oracle on Solaris
Territory
Expansion

Web
Display Ads

Measure
Sales

Install
Capacity

Act

Competitive
Moves

Observe

Commodity
Era – 2 week
agile train

Customer
Pain Point

Data
Warehouse

Orient
Capacity
Estimate

Code
Feature

Decide
Feature
Priority

2 Week
Plan
Business
Buy-in
00’s Commodity Agile Innovation Cycle
•
•
•
•
•

Cost $10K to $1M
Duration 2 – 12 weeks
Bet a product feature
Cost of failure – product mgr reputation
Java and MySQL on RedHat Linux
Train Model Process Hand-Off Steps
Product Manager

Developer
QA Integration Team
Operations Deploy Team
BI Analytics Team
What Happened?
Rate of change
increased

Cost and size
and risk of
change reduced
Cloud Native
Construct a highly agile and highly
available service from ephemeral and
assumed broken components
Real Web Server Dependencies Flow
(Netflix Home page business transaction as seen by AppDynamics)
Each icon is
three to a few
hundred
instances
across three
AWS zones

Cassandra
memcached

Start Here

Personalization movie group choosers
(for US, Canada and Latam)

Web service
S3 bucket
Continuous Deployment
No time for handoff to IT
Developer Self Service
Freedom and Responsibility
Developers run what
they wrote
Root access and pagerduty
IT is a Cloud API
DEVops automation
Github all the things!
Leverage social coding
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is doing Continuous Delivery
Putting it all together…
Land grab
opportunity

Launch AB
Test
Automatic
Deploy

Measure
Customers

Act

Competitive
Move

Observe

Continuous
Delivery on
Cloud

Customer
Pain Point

Analysis

Orient
Model
Hypotheses

Increment
Implement

Decide

Plan
Response

Share Plans
JFDI
Continuous Innovation Cycle
•
•
•
•
•

Cost near zero, variable expense
Duration hours to days
Bet a decoupled microservice code push
Cost of failure – near zero, instant rollback
Clojure/Scala/Python on NoSQL on Cloud
Continuous Deploy Hand-Off Steps
Product Manager
A/B test setup and enable
Self service hypothesis test results

Developer
Automated test

Self service deploy, on call
Self service analytics
Continuous Deploy Automation
Check in code, Jenkins build
Bake AMI, launch in test env

Functional and performance test
Production canary test
Production red/black push
Bad Canary Signature
Happy Canary Signature
Global Deploy Automation
Afternoon in California
Night-time in Europe
If passes test suite, canary then deploy

West Coast Load Balancers

East Coast Load Balancers

Europe Load Balancers

Zone A

Zone B

Zone C

Zone A

Zone B

Zone C

Zone A

Zone B

Zone C

Cassandra Replicas

Cassandra Replicas

Cassandra Replicas

Cassandra Replicas

Cassandra Replicas

Cassandra Replicas

Cassandra Replicas

Cassandra Replicas

Cassandra Replicas

Canary then deploy

Next day on West Coast

Canary then deploy

Next day on East Coast

After peak in Europe
Ephemeral Instances
• Largest services are autoscaled
• Average lifetime of an instance is 36 hours

Autoscale Up

Autoscale Down

P
u
s
h
(New Today!) Predictive Autoscaling

24 Hours predicted traffic vs. actual
More morning load
Sat/Sun high traffic

Lower load on Weds

Prediction driving AWS Autoscaler to plan capacity
Inspiration
Takeaway
Speed Wins
Assume Broken
Cloud Native Automation
Github is your “app store” and resumé
@adrianco @NetflixOSS
http://netflix.github.com

More Related Content

PPTX
Kubernetes 101 for Beginners
PPTX
AWS Lambda
PDF
Kubernetes 101
PDF
Kubernetes Introduction
PDF
Kubernetes Basics
PPTX
Kubernetes PPT.pptx
PDF
AWS Lambda
PDF
Kubernetes 101 for Beginners
AWS Lambda
Kubernetes 101
Kubernetes Introduction
Kubernetes Basics
Kubernetes PPT.pptx
AWS Lambda

What's hot (20)

PPTX
Micro services Architecture
PDF
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
PDF
Kubernetes 101 - an Introduction to Containers, Kubernetes, and OpenShift
ODP
Kubernetes Architecture
PPTX
Introduction to docker
PDF
Kubernetes Concepts And Architecture Powerpoint Presentation Slides
PDF
Understanding MicroSERVICE Architecture with Java & Spring Boot
PDF
Hands-On Introduction to Kubernetes at LISA17
PDF
AWS Fargate on EKS 실전 사용하기
PDF
MicroService Architecture
PDF
Docker, Linux Containers (LXC), and security
PDF
Microservice Architecture
PDF
[2017 Windows on AWS] AWS 를 활용한 Active Directory 연동 및 이관 방안
PDF
Design patterns for microservice architecture
PDF
AWS Connectivity, VPC Design and Security Pro Tips
PDF
AWS Serverless Introduction (Lambda)
PPTX
AWS Lambda
PDF
Autoscaling Kubernetes
PDF
What Is Kubernetes | Kubernetes Introduction | Kubernetes Tutorial For Beginn...
PDF
LG전자 - Amazon Aurora 및 RDS 블루/그린 배포를 이용한 데이터베이스 업그레이드 안정성 확보 - 발표자: 이은경 책임, L...
Micro services Architecture
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Kubernetes 101 - an Introduction to Containers, Kubernetes, and OpenShift
Kubernetes Architecture
Introduction to docker
Kubernetes Concepts And Architecture Powerpoint Presentation Slides
Understanding MicroSERVICE Architecture with Java & Spring Boot
Hands-On Introduction to Kubernetes at LISA17
AWS Fargate on EKS 실전 사용하기
MicroService Architecture
Docker, Linux Containers (LXC), and security
Microservice Architecture
[2017 Windows on AWS] AWS 를 활용한 Active Directory 연동 및 이관 방안
Design patterns for microservice architecture
AWS Connectivity, VPC Design and Security Pro Tips
AWS Serverless Introduction (Lambda)
AWS Lambda
Autoscaling Kubernetes
What Is Kubernetes | Kubernetes Introduction | Kubernetes Tutorial For Beginn...
LG전자 - Amazon Aurora 및 RDS 블루/그린 배포를 이용한 데이터베이스 업그레이드 안정성 확보 - 발표자: 이은경 책임, L...
Ad

Viewers also liked (20)

PPTX
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
PDF
Cloud Architecture Tutorial - Running in the Cloud (3of3)
PPTX
Speeding Up Innovation
PDF
Netflix Architecture Tutorial at Gluecon
PDF
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
PPTX
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
PDF
Cloud Architecture Tutorial - Why and What (1of 3)
PPTX
Bottleneck analysis - Devopsdays Silicon Valley 2013
PPTX
AWS Re:Invent - High Availability Architecture at Netflix
PPTX
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
PPTX
Gluecon keynote
PPTX
Architectures for High Availability - QConSF
PPTX
Cassandra Performance and Scalability on AWS
PPTX
Dystopia as a Service
PPTX
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
PDF
Netflix Global Cloud Architecture
PDF
Microservices Workshop All Topics Deck 2016
PDF
SV Forum Platform Architecture SIG - Netflix Open Source Platform
PPTX
NetflixOSS Meetup
PPTX
Netflix and Open Source
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Speeding Up Innovation
Netflix Architecture Tutorial at Gluecon
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
Cloud Architecture Tutorial - Why and What (1of 3)
Bottleneck analysis - Devopsdays Silicon Valley 2013
AWS Re:Invent - High Availability Architecture at Netflix
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon keynote
Architectures for High Availability - QConSF
Cassandra Performance and Scalability on AWS
Dystopia as a Service
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Netflix Global Cloud Architecture
Microservices Workshop All Topics Deck 2016
SV Forum Platform Architecture SIG - Netflix Open Source Platform
NetflixOSS Meetup
Netflix and Open Source
Ad

Similar to Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is doing Continuous Delivery (15)

PDF
Vn introduction to cloud computing with amazon web services
PPT
Build & Deploy Scalable Cloud Applications in Record Time
PDF
RightScale Webinar: Operationalize Your Enterprise AWS Usage Through an IT Ve...
PDF
[Cloud Computing Day with V-Forum] Going Global on AWS
PDF
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
PDF
Solving enterprise challenges through scale out storage & big compute final
PPTX
re:Invent Recap-AWSMeetup
PPTX
AWS Meetup Fort Lauderdale Re:invent Recap
PDF
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
PDF
Look Before You Leap: Migrating On-Premises Hadoop to AWS
PDF
4. aws enterprise summit seoul 기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas park
PPTX
5 Years Of Building SaaS On AWS
PDF
Aws 101 A walk-through the aws cloud (2013)
PPTX
Highway to heaven - Microservices Meetup Munich
PPTX
ARC201 Microservices Architecture @ AWS re:Invent 2015
Vn introduction to cloud computing with amazon web services
Build & Deploy Scalable Cloud Applications in Record Time
RightScale Webinar: Operationalize Your Enterprise AWS Usage Through an IT Ve...
[Cloud Computing Day with V-Forum] Going Global on AWS
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
Solving enterprise challenges through scale out storage & big compute final
re:Invent Recap-AWSMeetup
AWS Meetup Fort Lauderdale Re:invent Recap
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Look Before You Leap: Migrating On-Premises Hadoop to AWS
4. aws enterprise summit seoul 기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas park
5 Years Of Building SaaS On AWS
Aws 101 A walk-through the aws cloud (2013)
Highway to heaven - Microservices Meetup Munich
ARC201 Microservices Architecture @ AWS re:Invent 2015

More from Adrian Cockcroft (12)

PDF
Netflix Global Applications - NoSQL Search Roadshow
PDF
Netflix in the Cloud at SV Forum
PDF
Global Netflix Platform
PDF
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
PDF
Migrating Netflix from Datacenter Oracle to Global Cassandra
PDF
Netflix Velocity Conference 2011
PDF
Migrating to Public Cloud
PDF
Performance architecture for cloud connect
PDF
Netflix in the cloud 2011
PDF
Cmg06 utilization is useless
PDF
Netflix on Cloud - combined slides for Dev and Ops
PDF
NoSQL for Netflix
Netflix Global Applications - NoSQL Search Roadshow
Netflix in the Cloud at SV Forum
Global Netflix Platform
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Migrating Netflix from Datacenter Oracle to Global Cassandra
Netflix Velocity Conference 2011
Migrating to Public Cloud
Performance architecture for cloud connect
Netflix in the cloud 2011
Cmg06 utilization is useless
Netflix on Cloud - combined slides for Dev and Ops
NoSQL for Netflix

Recently uploaded (20)

PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Advanced IT Governance
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
Cloud computing and distributed systems.
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Advanced Soft Computing BINUS July 2025.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Modernizing your data center with Dell and AMD
PPTX
MYSQL Presentation for SQL database connectivity
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Empathic Computing: Creating Shared Understanding
Understanding_Digital_Forensics_Presentation.pptx
Advanced IT Governance
The Rise and Fall of 3GPP – Time for a Sabbatical?
Per capita expenditure prediction using model stacking based on satellite ima...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Cloud computing and distributed systems.
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Advanced Soft Computing BINUS July 2025.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Advanced methodologies resolving dimensionality complications for autism neur...
Chapter 3 Spatial Domain Image Processing.pdf
GamePlan Trading System Review: Professional Trader's Honest Take
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Modernizing your data center with Dell and AMD
MYSQL Presentation for SQL database connectivity
The AUB Centre for AI in Media Proposal.docx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Empathic Computing: Creating Shared Understanding

Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is doing Continuous Delivery