SlideShare a Scribd company logo
Josh Evans - Director of Operations Engineering
March, 2016
#NetflixEverywhere
Global Architecture
December 24th, 2012
Disappointment
Outrage
Withdrawal
December 24th, 2012
#NetflixEverywhere Global Architecture
Failure is inevitable
Never fail the same way twice
Failure-Driven Architecture
#NetflixEverywhere
Failure-Driven Architecture
Never fail the same way twice
• Introductions
• Failure-Driven Architecture
• Taking It Global
#NetflixEverywhere
Our Talk Today
• Introductions
• Failure-Driven Architecture
• Taking It Global
#NetflixEverywhere
Our Talk Today
1999 – 2009
• Ecommerce (DVD  Streaming)
2009 – 2013
• Playback Services (Activate, Manifests, DRM)
2013 - present
• Operations Engineering
– CD, RTA, Chaos, Performance
Josh Evans – Director of Operations Engineering
jevans@netflix.com
Bringing movies & TV shows from all over the
world to people all over the world
• Streaming, on demand, subscription
• Global & regional licensing
• Hollywood, independent, international
• Striving for global ubiquity
2007
• Jan – Windows
2008
• May – Roku
• Oct – LG, Samsung Blu-ray
• Oct – Apple Mac
• Nov – XBox 360
2009
• Jun – LG DTV
• Nov –Sony PS3 (disc)
• Nov – Sony Bravia
– DTV & Blu-ray
Device Ubiquity
2011
• May – Android
• Nov – First e-readers
– Kindle Fire, Nook
Device Ubiquity
2010
• Mar – Nintendo Wii (disc)
• Apr – Apple iPad
• Aug – Apple iPhone
• Sep – Apple TV
• Oct – Sony PS3 (no disc)
• Oct – Nintendo Wii (no disc)
• Nov – Windows Phone 7
#NetflixEverywhere Global Architecture
#NetflixEverywhere Global Architecture
2010 - Canada
2011 - Latin America
2012 - UK, Ireland, Nordics
2013 – Netherlands
2014 - Austria, Belgium, France, Germany, Luxembourg, Switzerland
2015 - Australia, New Zealand, Japan, Spain, Italy, Portugal
Geographic Ubiquity
• English
• Spanish (Latin American)
• Portuguese (Brazilian)
• Dutch
• French
• German
• Japanese
• Spanish (Castilian)
• Italian
• Portuguese (European)
Language Ubiquity - Subs, Dubs, UI
75,000,000
• Introductions
• Failure-Driven Architecture
• Taking It Global
#NetflixEverywhere
Our Talk Today
August 2008
• No automation, virtualization, standardization
• Manual, error prone, slow
• Big iron & monoliths
DC2
2009
Undifferentiated Heavy Lifting
US-East-1
Amazon Web Services
2010
• Scale & elasticity
• Virtual, programmable
• Global footprint
• Micro-services
• Database
• Cache
• Traffic
Architectural Pillars
• Micro-services
• Database
• Cache
• Traffic
Architectural Pillars
#NetflixEverywhere Global Architecture
#NetflixEverywhere Global Architecture
#NetflixEverywhere Global Architecture
FIT
Fault-Injection
Test Framework
Micro-service Failure
• Micro-services
• Database
• Cache
• Traffic
Architectural Pillars
NoSQL but…
• Not web scale
• Throttling
Modest scale
• 100s of play starts / second
• 10,000s of requests / second
• 10s of billions of records
SimpleDB
• Micro-services
• Database
• Cache
• Traffic
Architectural Pillars
Ephemeral Volatile memCache (EVCache)
Clustered memcached optimized for Netflix use cases
EVCache Server
Memcached
Prana (Sidecar)
Monitoring & Other Processes
Eureka
Client Application
Client Library
EVCache Client
Shards, consistent hashing
TTLs & LRU
EVCache Architecture
Zone A
Client Application
Client Library
EVCache Client
Zone B
Client Application
Client Library
EVCache Client
Zone C
Client Application
Client Library
EVCache Client
. . .. . .. . .
Reads
Zone A Zone B Zone C
. . .. . .. . .
Writes
Client Application
Client Library
EVCache Client
Client Application
Client Library
EVCache Client
Client Application
Client Library
EVCache Client
1. Read from cache
2. On cache miss call service
3. Service calls DB & responds
4. Service updates cache
Client Application
Client Library
EVCache Client Service Client
S S S S. . .
DB DB DB DB. . .
. . .
Fronting Micro-services
. . .
Linear Scaling
• 30 million requests/sec
• 2 trillion requests per day globally
• Hundreds of billions of objects
• Tens of thousands of memcached instances
• Milliseconds of latency per request
US-East-1
Canada
International Expansion
2011
US
Latin America
US-East-1 EU-West-1
Cloud Islands
2012
• Micro-services
• Database
• Cache
• Traffic
Architectural Pillars
US-East-1 EU-West-1
UK/IE, Nordics,
Netherlands
Latin America
DNS Geo Mapping
Canada
US
• Micro-services
• Database!
• Caching
• Traffic
Architectural Pillars
Why Cassandra?
• NoSQL at scale
• Open source
• Multi-region
• Multi-directional
• CAP Choices
– Availability
– Partition tolerance
– Eventual consistency*
Scalable, Durable, Global
Single Region, Multiple AZs
1. Client writes to any node
2. Coordinator replicates to nodes
3. Nodes ack to coordinator
4. Coordinator acks to client
5. Write to commit log
Zone A
Zone B
Zone C
Zone B
Zone C
Client
Zone A
• Hinted handoff to offline nodes
Local Quorum
(Typical)
100ms
Not quite fast enough
December 24th, 2012
US-East-1US-West-2 EU-West-1
Isthmus
Spring 2013
Survive a regional ELB outage
AZ1 AZ2 AZ3
US-EAST-1 ELBs
Zuul
Data Data Data
Geo-located
state/province
AZ1 AZ2 AZ3
US-WEST-2 ELBs
Zuul
Data Data Data
Americas Internet Traffic
Eastern NA +
LatAm Traffic
• Zuul routes locally or remotely
• Eureka - multi-region aware
Isthmus
US-East-1US-West-2 EU-West-1
Active-Active
2013 - 2014
Survive a large-scale regional service outage
Active-Active Data Replication
Region BRegion A
Zone A
Zone B
Zone C
Zone B
Zone C
Zone A
Zone A
Zone B
Zone C
Zone C
Client Client
Zone AZone B
Multi-Region Writes
500ms
Bi-directional
Nightly compare & repair
Local Quorum
(Typical)
EVCache
Replication
Repl Writer
SQS
Application
Client
EVCache
Replication
Repl Writer
1. Set or
delete
2. send
metadata
3. poll msg
6. set or
delete
Application
Client
SQS
7. read
EVCache Cross-Region Replication
Region BRegion A
Active-Active Traffic Management
ELB US-West-2 ELB US-East-1 ELB EU-West-1
DNS api-global.netflix.com UltraDNS
Route53
DNS api-global.netflix.com • Remove state from geo bucket
ELB US-West-2 ELB US-East-1 ELB EU-West-1
api-global.netflix.com
DNS • Remove state from geo bucket
• Add state to geo bucket
• Log event
• For each end point
ELB US-West-2 ELB US-East-1 ELB EU-West-1
api-global.netflix.com
api-global.us-west-2
.prodaa.neflix.com
api-global.us-east-1
.prodaa.neflix.com
api-global.eu-west-1
.prodaa.neflix.com
ELB ELB ELB
Shim
api-global.netflix.com
api-global.us-west-2
.prodaa.neflix.com
api-global.us-east-1
.prodaa.neflix.com
api-global.eu-west-1
.prodaa.neflix.com
ELB ELB ELB
Shim
api-global.netflix.com
api-global.us-west-2
.prodaa.neflix.com
api-global.us-east-1
.prodaa.neflix.com
api-global.eu-west-1
.prodaa.neflix.com
ELB ELB ELB
Shim
Active-Active Failover
#NetflixEverywhere Global Architecture
• Introductions
• Failure-Driven Architecture
• Taking It Global
#NetflixEverywhere
Our Talk Today
January 6th, 2016
Geographic Ubiquity
Before Global
• English
• Spanish (Latin American)
• Portuguese (Brazilian)
• Dutch
• French
• German
• Japanese
• Spanish (Castilian)
• Italian
• Portuguese (European)
Global
• Chinese
• Korean
• Arabic
Language Ubiquity
#NetflixEverywhere Global Architecture
#NetflixEverywhere Global Architecture
March 18th, 2016
Daredevil Season 2
All episodes, all devices, all countries
Simultaneously
Content Ubiquity
Ubiquitous, Resilient Architecture
US-East-1US-West-2 EU-West-1
Reliably and efficiently serve any customer from any region
Netflix Global
2015
US-East-1US-West-2 EU-West-1
US-East-1US-West-2 EU-West-1
Ubiquitous Data
EVCache
Replication
Repl Writer
Application
Client
Kafka
SQS
• High latency
• Read once
Kafka
• Low latency
• Multiple readers
• > 1M replications/sec
US Ring US Ring
EU Ring
EU-West-1US-East-1
1. Extend US ring to EU region & run repairs
Client
2. Dual Write
3. Forklift
EU-West-1US-West-2
Global
Ring
Global
Ring
US-East-1
Global
Ring
Ubiquitous Traffic Management
us-east-1-na
• East US
• East CA
• MX
us-west-2
• APAC
• West US
• West CA
eu-west-1
• Europe
• Mid East
• Africa
us-east-1-sa
• LatAm
• Not MX
Virtual DNS Regions
• Fixed virtual modules
• Origin tier
• Standardized names
api-global.netflix.com
api-global.us-west-2
.prodaa.neflix.com
api-global.us-east-1-sa
.prodaa.neflix.com
api-global.us-east-1-na
.prodaa.neflix.com
api-global.eu-west-1
.prodaa.netflix.com
api-global.us-west-2.origin
.prodaa.neflix.com
api-global.us-east-1.origin
.prodaa.neflix.com
api-global.eu-west-1.origin
.prodaa.neflix.com
ELB ELB ELB
Virtual
Origin
DNS Tiers
api-global.netflix.com
api-global.us-west-2
.prodaa.neflix.com
api-global.us-east-1-sa
.prodaa.neflix.com
api-global.us-east-1-na
.prodaa.neflix.com
api-global.eu-west-1
.prodaa.netflix.com
api-global.us-west-2.origin
.prodaa.neflix.com
api-global.us-east-1.origin
.prodaa.neflix.com
api-global.eu-west-1.origin
.prodaa.neflix.com
ELB ELB ELB
Virtual
Origin
Split Failover
api-global.netflix.com
api-global.us-west-2
.prodaa.neflix.com
api-global.us-east-1-sa
.prodaa.neflix.com
api-global.us-east-1-na
.prodaa.neflix.com
api-global.eu-west-1
.prodaa.neflix.com
api-global.us-west-2.origin
.prodaa.neflix.com
api-global.us-east-1.origin
.prodaa.neflix.com
api-global.eu-west-1.origin
.prodaa.neflix.com
ELB ELB ELB
Virtual
Origin
Split Failover
api-global.netflix.com
api-global.us-west-2
.prodaa.neflix.com
api-global.us-east-1-sa
.prodaa.neflix.com
api-global.us-east-1-na
.prodaa.neflix.com
api-global.eu-west-1
.prodaa.netflix.com
api-global.us-west-2.origin
.prodaa.neflix.com
api-global.us-east-1.origin
.prodaa.neflix.com
api-global.eu-west-1.origin
.prodaa.neflix.com
ELB ELB ELB
Virtual
Origin
Cascading
Failover
api-global.netflix.com
api-global.us-west-2
.prodaa.neflix.com
api-global.us-east-1-sa
.prodaa.neflix.com
api-global.us-east-1-na
.prodaa.neflix.com
api-global.eu-west-1
.prodaa.netflix.com
api-global.us-west-2.origin
.prodaa.neflix.com
api-global.us-east-1.origin
.prodaa.neflix.com
api-global.eu-west-1.origin
.prodaa.neflix.com
ELB ELB ELB
Virtual
Origin
Cascading
Failover
api-global.netflix.com
api-global.us-west-2
.prodaa.neflix.com
api-global.us-east-1-sa
.prodaa.neflix.com
api-global.us-east-1-na
.prodaa.neflix.com
api-global.eu-west-1
.prodaa.netflix.com
api-global.us-west-2.origin
.prodaa.neflix.com
api-global.us-east-1.origin
.prodaa.neflix.com
api-global.eu-west-1.origin
.prodaa.neflix.com
ELB ELB ELB
Virtual
Origin
Cascading
Failover
api-global.netflix.com
api-global.us-west-2
.prodaa.neflix.com
api-global.us-east-1-sa
.prodaa.neflix.com
api-global.us-east-1-na
.prodaa.neflix.com
api-global.eu-west-1
.prodaa.netflix.com
api-global.us-west-2.origin
.prodaa.neflix.com
api-global.us-east-1.origin
.prodaa.neflix.com
api-global.eu-west-1.origin
.prodaa.neflix.com
ELB ELB ELB
Virtual
Origin
Cascading
Failover
#NetflixEverywhere Global Architecture
x x
Multi-region Failover
January 6th, 2016
“Going global is just like having a baby.”
- Reed Hastings, Netflix CEO
What’s next?
• Global latency
• Edge computing
• ML-based monitoring
• Self-healing systems
• Capacity utilization
• Fast, autonomous traffic
• Integrate DB & caching
#NetflixEverywhere
Takeaways
Never fail the same way twice
Christmas Eve
2012
Today
Know your resiliency patterns
Pattern Properties
DC SPoF, infrastructure heavy lifting
Cloud (one region) Multiple DCs, one control plane
Islands Regional containment
Isthmus Regional ELB bypass
Active-active Regional failover
Global Ubiquity, resiliency, efficiency
Invest in architectural pillars
• Micro-services
• Database
• Caching
• Traffic
#NetflixEverywhere
Think globally, act locally
netflix.github.io
netflix.github.io
Netflix Tech Blog
techblog.netflix.com
Josh Evans
jevans@netflix.com
@ops_engineering
#NetflixEverywhere
Global Architecture
?
?
?
?
?

More Related Content

PPTX
Embracing Failure - Fault Injection and Service Resilience at Netflix
PPTX
Engineering Netflix Global Operations in the Cloud
PPTX
Refactoring Organizations - A Netflix Study (QCon NYC 2017)
PPTX
Netflix Edge Engineering Open House Presentations - June 9, 2016
PPTX
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
PPTX
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
PDF
Netflix Global Cloud Architecture
PPTX
Translating Developer Productivity to Netflix Customer Delight
Embracing Failure - Fault Injection and Service Resilience at Netflix
Engineering Netflix Global Operations in the Cloud
Refactoring Organizations - A Netflix Study (QCon NYC 2017)
Netflix Edge Engineering Open House Presentations - June 9, 2016
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
Netflix Global Cloud Architecture
Translating Developer Productivity to Netflix Customer Delight

What's hot (20)

PPTX
Dystopia as a Service
PPTX
AWS Re:Invent - High Availability Architecture at Netflix
PPTX
Immutable Infrastructure: the new App Deployment
PDF
Stream Collections - Scala Days
PPTX
URP? Excuse You! The Three Kafka Metrics You Need to Know
PDF
ReST Vs SOA(P) ... Yawn
PDF
Being Agile with Scrum - koders.co
ODP
Zero Downtime JEE Architectures
PDF
Percona presentation v2
PPTX
Load balancing theory and practice
PDF
Scaling tappsi
PDF
The Platform Mullet
PDF
Ensuring Performance in a Fast-Paced Environment (CMG 2014)
PDF
Automatic Undo for Cloud Management via AI Planning
PDF
Cloudstate - Towards Stateful Serverless
PDF
Accelerate Your OpenStack Deployment
PPTX
Principles of Kubernetes
PPTX
Enterprise Software Architecture styles
PPTX
Leveraging HP Performance Center
PPTX
20160609 nike techtalks reactive applications tools of the trade
Dystopia as a Service
AWS Re:Invent - High Availability Architecture at Netflix
Immutable Infrastructure: the new App Deployment
Stream Collections - Scala Days
URP? Excuse You! The Three Kafka Metrics You Need to Know
ReST Vs SOA(P) ... Yawn
Being Agile with Scrum - koders.co
Zero Downtime JEE Architectures
Percona presentation v2
Load balancing theory and practice
Scaling tappsi
The Platform Mullet
Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Automatic Undo for Cloud Management via AI Planning
Cloudstate - Towards Stateful Serverless
Accelerate Your OpenStack Deployment
Principles of Kubernetes
Enterprise Software Architecture styles
Leveraging HP Performance Center
20160609 nike techtalks reactive applications tools of the trade
Ad

Viewers also liked (10)

PPTX
6/18/14 Billing & Payments Engineering Meetup I
PPTX
Application Networks: Microservices and APIs at Netflix
PPTX
Aws multi-region High Availability
PPTX
Mastering Chaos - A Netflix Guide to Microservices
PDF
MicroService Architecture
PDF
Microservices Workshop All Topics Deck 2016
PPTX
MicroServices at Netflix - challenges of scale
PDF
Dockercon State of the Art in Microservices
PPTX
Beyond DevOps - How Netflix Bridges the Gap
PDF
Build Features, Not Apps
6/18/14 Billing & Payments Engineering Meetup I
Application Networks: Microservices and APIs at Netflix
Aws multi-region High Availability
Mastering Chaos - A Netflix Guide to Microservices
MicroService Architecture
Microservices Workshop All Topics Deck 2016
MicroServices at Netflix - challenges of scale
Dockercon State of the Art in Microservices
Beyond DevOps - How Netflix Bridges the Gap
Build Features, Not Apps
Ad

Similar to #NetflixEverywhere Global Architecture (20)

PDF
#NetflixEverywhere Global Architecture
PPTX
Arc305 how netflix leverages multiple regions to increase availability an i...
PDF
Cassandra Summit 2014: Active-Active Cassandra Behind the Scenes
PDF
PLNOG 8: Kazimierz Jantas - Innowacyjne rozwiązania dla IT
PDF
QConSF2016-JoshEvans-MasteringChaosANetflixGuidetoMicroservices-compressed.pdf
PDF
Как сделать высоконагруженный сервис, не зная количество нагрузки / Олег Обле...
PPTX
HA and DR for Cloud Workloads
PPTX
Windows Phone 7 and Windows Azure – A Match Made in the Cloud
PDF
Coates bosc2010 clouds-fluff-and-no-substance
PPTX
OpenStack Dragonflow shenzhen and Hangzhou meetups
PDF
City of Atlanta Oracle Application Footprint
PDF
Nvp deep dive_session_cee-day
PPTX
Using Kubernetes to deliver a “serverless” service
PPTX
Stateful Applications On the Cloud: A PayPal Journey
PPTX
Net flix embracingfailure re-invent2014-141113085858-conversion-gate02
PDF
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
PPTX
Windsor AWS UG Virtual Private Cloud
PPT
Cdn cs6740
PPTX
Tuning kafka pipelines
PPTX
Using Kubernetes to deliver a “serverless” service
#NetflixEverywhere Global Architecture
Arc305 how netflix leverages multiple regions to increase availability an i...
Cassandra Summit 2014: Active-Active Cassandra Behind the Scenes
PLNOG 8: Kazimierz Jantas - Innowacyjne rozwiązania dla IT
QConSF2016-JoshEvans-MasteringChaosANetflixGuidetoMicroservices-compressed.pdf
Как сделать высоконагруженный сервис, не зная количество нагрузки / Олег Обле...
HA and DR for Cloud Workloads
Windows Phone 7 and Windows Azure – A Match Made in the Cloud
Coates bosc2010 clouds-fluff-and-no-substance
OpenStack Dragonflow shenzhen and Hangzhou meetups
City of Atlanta Oracle Application Footprint
Nvp deep dive_session_cee-day
Using Kubernetes to deliver a “serverless” service
Stateful Applications On the Cloud: A PayPal Journey
Net flix embracingfailure re-invent2014-141113085858-conversion-gate02
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
Windsor AWS UG Virtual Private Cloud
Cdn cs6740
Tuning kafka pipelines
Using Kubernetes to deliver a “serverless” service

Recently uploaded (20)

PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPT
Teaching material agriculture food technology
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
A Presentation on Artificial Intelligence
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Chapter 3 Spatial Domain Image Processing.pdf
Understanding_Digital_Forensics_Presentation.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Encapsulation theory and applications.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Electronic commerce courselecture one. Pdf
Encapsulation_ Review paper, used for researhc scholars
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Teaching material agriculture food technology
Review of recent advances in non-invasive hemoglobin estimation
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Reach Out and Touch Someone: Haptics and Empathic Computing
A Presentation on Artificial Intelligence
Unlocking AI with Model Context Protocol (MCP)
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MYSQL Presentation for SQL database connectivity
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Per capita expenditure prediction using model stacking based on satellite ima...
“AI and Expert System Decision Support & Business Intelligence Systems”
Chapter 3 Spatial Domain Image Processing.pdf

#NetflixEverywhere Global Architecture

Editor's Notes

  • #3: On Christmas eve, 2012 Netflix experiences a region-wide outage due an accidental ELB configuration change Many engineers were on call, missing time with their families They spent much of the night and into the morning trying to mitigate the impact of the outage on our customers but to no avail We ultimately had to wait for Amazon to address the root cause And our members, many of them new to Netflix were unable to stream Their responses varied in intensity from…
  • #47: In preparation for UK launch we starting shoring up our data persistence story – one that enabled multi-region capabilities
  • #93: \