SlideShare a Scribd company logo
Running in multiple data centers
16 – 17 November, SofiaISTACON.ORG
Running in multiple
data centers
By Nikolay Stoitsev
16 – 17 November, SofiaISTACON.ORG
16 – 17 November, SofiaISTACON.ORG
600+ cities
16 – 17 November, SofiaISTACON.ORG
75+ countries
16 – 17 November, SofiaISTACON.ORG
6 continents
2 000 000+
drivers
Running in multiple data centers
16 – 17 November, SofiaISTACON.ORG
How the Internet Kept Humming During 2 Hurricanes
https://www.nytimes.com/2017/09/18/us/harvey-irma-internet.html
16 – 17 November, SofiaISTACON.ORG
Fault tolerance
16 – 17 November, SofiaISTACON.ORG
Low latency
16 – 17 November, SofiaISTACON.ORG
Compliance
16 – 17 November, SofiaISTACON.ORG
Data locality
16 – 17 November, SofiaISTACON.ORG
Under-utilized capacity
16 – 17 November, SofiaISTACON.ORG
CAP
16 – 17 November, SofiaISTACON.ORG
Continuous network
partition
16 – 17 November, SofiaISTACON.ORG
2 types of architecture
16 – 17 November, SofiaISTACON.ORG
Active-Passive
DC 1 DC 2
DC 1 DC 2
16 – 17 November, SofiaISTACON.ORG
Failover
16 – 17 November, SofiaISTACON.ORG
DNS
16 – 17 November, SofiaISTACON.ORG
Stateless service
16 – 17 November, SofiaISTACON.ORG
Stateful service
DC 1
DC 2
DB 1
DB 2
Active-Passive example
DC 1
DC 2
DB 1
DB 2
Active-Passive example
DC 1
DC 2
DB 1
DB 2
Active-Passive example
DC 1
DC 2
Master
Slave
Slave
Slave
Real-life example
DC 1
DC 2
Master
Slave
Slave
Slave
HAProxy
HAProxy
Smart intermediary
DC 1
DC 2
Master
Slave
Slave
Slave
HAProxy
HAProxy
Smart intermediary
DC 1
DC 2
Slave
Slave
Master
Slave
HAProxy
HAProxy
Smart intermediary
16 – 17 November, SofiaISTACON.ORG
All-active
DC 1 DC 2
16 – 17 November, SofiaISTACON.ORG
Locality
16 – 17 November, SofiaISTACON.ORG
Split traffic in groups
16 – 17 November, SofiaISTACON.ORG
Global State
mod 2
DC 1
DC 2
user_id
= 0
= 1
Partitioning
mod 3
DC 1
DC 2
user_id
= 0
= 1
DC 3
= 2
Partitioning
16 – 17 November, SofiaISTACON.ORG
Very inefficient
16 – 17 November, SofiaISTACON.ORG
Consistent hashing
DC 1
DC 3
DC 2DC 3
16 – 17 November, SofiaISTACON.ORG
Consistent hashing
DC 1
DC 3
DC 2DC 3
user_id
16 – 17 November, SofiaISTACON.ORG
DNS load balancing
16 – 17 November, SofiaISTACON.ORG
DC 1 DC 2
San
Francisco
Los Angeles New York Toronto
16 – 17 November, SofiaISTACON.ORG
DC 1 DC 2
San
Francisco
Los Angeles New York Toronto
16 – 17 November, SofiaISTACON.ORG
Database layer
16 – 17 November, SofiaISTACON.ORG
No generic solution
16 – 17 November, SofiaISTACON.ORG
Galera Cluster
Synchronous multi-master database cluster
http://galeracluster.com/
16 – 17 November, SofiaISTACON.ORG
DC 1
Master
Slave
DC 2
Slave
Master
DC 3
Master
Slave
DC 4
Slave
Master
16 – 17 November, SofiaISTACON.ORG
Apache Cassandra
http://cassandra.apache.org/
16 – 17 November, SofiaISTACON.ORG
Linear scalability
Fault-tolerance
Commodity hardware
16 – 17 November, SofiaISTACON.ORG
Designed for multiple data
centers
16 – 17 November, SofiaISTACON.ORG
Apache Mesos
http://mesos.apache.org/
16 – 17 November, SofiaISTACON.ORG
Application Layer
16 – 17 November, SofiaISTACON.ORG
Apache Kafka
16 – 17 November, SofiaISTACON.ORG
uReplicator
https://github.com/uber/uReplicator
16 – 17 November, SofiaISTACON.ORG
https://eng.uber.com/ureplicator/
16 – 17 November, SofiaISTACON.ORG
https://eng.uber.com/ureplicator/
16 – 17 November, SofiaISTACON.ORG
Cherami
https://github.com/uber/cherami-server
16 – 17 November, SofiaISTACON.ORG
Multi-zone topics
Producer
Producer
Topic
Topic
Consumer
Group
Consumer
Group
replication
16 – 17 November, SofiaISTACON.ORG
Multi-zone consumers
Producer Topic
Topic
Consumer
Group
Consumer
Group
replication offset sync
16 – 17 November, SofiaISTACON.ORG
https://eng.uber.com/cherami/
16 – 17 November, SofiaISTACON.ORG
Lessons learned
16 – 17 November, SofiaISTACON.ORG
Total dev
time
Time
thinking
about
failover
16 – 17 November, SofiaISTACON.ORG
Total dev
time
Time
thinking
about
failover
16 – 17 November, SofiaISTACON.ORG
Failover testing
16 – 17 November, SofiaISTACON.ORG
Failure testing
16 – 17 November, SofiaISTACON.ORG
Super smart clients
16 – 17 November, SofiaISTACON.ORG
“The best way to avoid failure is to fail
constantly.
http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html
16 – 17 November, SofiaISTACON.ORG
Thank you!
@stoitsev
Nikolay Stoitsev
http://careersinfo.uber.com/sofia-engineering

More Related Content

PPTX
Power-up NoSQL with Cosmos DB
PDF
Spacebrew & Arduino Yún
PDF
Fact Sheets : Network Status in Bangladesh
PDF
Kafka based Global Data Mesh at Wix
PDF
Traffic Engineering Using Segment Routing
PDF
Introduction to NoSQL with Couchbase
PDF
Kafka based Global Data Mesh at Wix with Natan Silnitsky | Kafka Summit Londo...
PDF
Jesús Barrasa
Power-up NoSQL with Cosmos DB
Spacebrew & Arduino Yún
Fact Sheets : Network Status in Bangladesh
Kafka based Global Data Mesh at Wix
Traffic Engineering Using Segment Routing
Introduction to NoSQL with Couchbase
Kafka based Global Data Mesh at Wix with Natan Silnitsky | Kafka Summit Londo...
Jesús Barrasa

Similar to Running in multiple data centers (20)

PPTX
JSON and the Oracle Database
PDF
Triplewave: a step towards RDF Stream Processing on the Web
PDF
Guillotina: The Asyncio REST Resource API
PDF
Os Gottfrid
PPTX
Révolution eBPF - un noyau dynamique
PDF
How to Tame TDD - ISTA 2017
PDF
Options for running Kubernetes at scale across multiple cloud providers
PDF
Dev Days Europe - Kafka based Global Data Mesh at Wix
PDF
Devoxx Ukraine - Kafka based Global Data Mesh
PDF
Kafka Summit London - Kafka based Global Data Mesh at Wix
PPTX
Don’t talk to strangers: Test isolation with containers
PPTX
Devteach 2017 Store 2 million of audit a day into elasticsearch
PDF
Riak Use Cases : Dissecting The Solutions To Hard Problems
PDF
Nginx for Fun & Performance - Philipp Krenn - Codemotion Rome 2015
PDF
Consideration for Building a Private Cloud
PDF
BGP Scanner - Isolario BGP-MRT Data Reader C Library and Tool
PDF
Es-operator: Building an Elasticsearch Operator from the bottom up - kube-con...
PDF
Semantic Web and Web 3.0 - Web Technologies (1019888BNR)
PDF
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
PDF
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
JSON and the Oracle Database
Triplewave: a step towards RDF Stream Processing on the Web
Guillotina: The Asyncio REST Resource API
Os Gottfrid
Révolution eBPF - un noyau dynamique
How to Tame TDD - ISTA 2017
Options for running Kubernetes at scale across multiple cloud providers
Dev Days Europe - Kafka based Global Data Mesh at Wix
Devoxx Ukraine - Kafka based Global Data Mesh
Kafka Summit London - Kafka based Global Data Mesh at Wix
Don’t talk to strangers: Test isolation with containers
Devteach 2017 Store 2 million of audit a day into elasticsearch
Riak Use Cases : Dissecting The Solutions To Hard Problems
Nginx for Fun & Performance - Philipp Krenn - Codemotion Rome 2015
Consideration for Building a Private Cloud
BGP Scanner - Isolario BGP-MRT Data Reader C Library and Tool
Es-operator: Building an Elasticsearch Operator from the bottom up - kube-con...
Semantic Web and Web 3.0 - Web Technologies (1019888BNR)
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Ad

More from Nikolay Stoitsev (20)

PDF
Building vs Buying Software
PDF
How and why to manage your manager
PDF
From programming to management
PDF
A practical introduction to observability
PDF
Building a modern SaaS in 2020
PDF
Everything You Need to Know About NewSQL in 2020
PDF
3 lessons on effective communication for engineers
PDF
ISTA 2019 - Migrating data-intensive microservices from Python to Go
PDF
Evolving big microservice architectures
PDF
The career path of software engineers and how to navigate it
PDF
Migrating a data intensive microservice from Python to Go
PDF
Using Apache Kafka from Go
PDF
Large scale stream processing with Apache Flink
PDF
Scaling big with Apache Kafka
PDF
NewSQL: what, when and how
PDF
How to read the v8 source code?
PDF
Distributed tracing for big systems
PDF
Reusable patterns for scalable APIs running on Docker @ Java2Days
PDF
Everyday tools and tricks for scaling Node.js
PDF
Node.js at Uber
Building vs Buying Software
How and why to manage your manager
From programming to management
A practical introduction to observability
Building a modern SaaS in 2020
Everything You Need to Know About NewSQL in 2020
3 lessons on effective communication for engineers
ISTA 2019 - Migrating data-intensive microservices from Python to Go
Evolving big microservice architectures
The career path of software engineers and how to navigate it
Migrating a data intensive microservice from Python to Go
Using Apache Kafka from Go
Large scale stream processing with Apache Flink
Scaling big with Apache Kafka
NewSQL: what, when and how
How to read the v8 source code?
Distributed tracing for big systems
Reusable patterns for scalable APIs running on Docker @ Java2Days
Everyday tools and tricks for scaling Node.js
Node.js at Uber
Ad

Recently uploaded (20)

PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
System and Network Administration Chapter 2
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
medical staffing services at VALiNTRY
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Designing Intelligence for the Shop Floor.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
Introduction to Artificial Intelligence
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PPTX
assetexplorer- product-overview - presentation
PDF
Cost to Outsource Software Development in 2025
PDF
top salesforce developer skills in 2025.pdf
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
Reimagine Home Health with the Power of Agentic AI​
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
System and Network Administration Chapter 2
How to Choose the Right IT Partner for Your Business in Malaysia
wealthsignaloriginal-com-DS-text-... (1).pdf
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
medical staffing services at VALiNTRY
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Navsoft: AI-Powered Business Solutions & Custom Software Development
Designing Intelligence for the Shop Floor.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Introduction to Artificial Intelligence
Why Generative AI is the Future of Content, Code & Creativity?
assetexplorer- product-overview - presentation
Cost to Outsource Software Development in 2025
top salesforce developer skills in 2025.pdf
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Operating system designcfffgfgggggggvggggggggg
Reimagine Home Health with the Power of Agentic AI​

Running in multiple data centers

  • 2. 16 – 17 November, SofiaISTACON.ORG Running in multiple data centers By Nikolay Stoitsev
  • 3. 16 – 17 November, SofiaISTACON.ORG
  • 4. 16 – 17 November, SofiaISTACON.ORG 600+ cities
  • 5. 16 – 17 November, SofiaISTACON.ORG 75+ countries
  • 6. 16 – 17 November, SofiaISTACON.ORG 6 continents
  • 9. 16 – 17 November, SofiaISTACON.ORG How the Internet Kept Humming During 2 Hurricanes https://www.nytimes.com/2017/09/18/us/harvey-irma-internet.html
  • 10. 16 – 17 November, SofiaISTACON.ORG Fault tolerance
  • 11. 16 – 17 November, SofiaISTACON.ORG Low latency
  • 12. 16 – 17 November, SofiaISTACON.ORG Compliance
  • 13. 16 – 17 November, SofiaISTACON.ORG Data locality
  • 14. 16 – 17 November, SofiaISTACON.ORG Under-utilized capacity
  • 15. 16 – 17 November, SofiaISTACON.ORG CAP
  • 16. 16 – 17 November, SofiaISTACON.ORG Continuous network partition
  • 17. 16 – 17 November, SofiaISTACON.ORG 2 types of architecture
  • 18. 16 – 17 November, SofiaISTACON.ORG Active-Passive
  • 19. DC 1 DC 2
  • 20. DC 1 DC 2
  • 21. 16 – 17 November, SofiaISTACON.ORG Failover
  • 22. 16 – 17 November, SofiaISTACON.ORG DNS
  • 23. 16 – 17 November, SofiaISTACON.ORG Stateless service
  • 24. 16 – 17 November, SofiaISTACON.ORG Stateful service
  • 25. DC 1 DC 2 DB 1 DB 2 Active-Passive example
  • 26. DC 1 DC 2 DB 1 DB 2 Active-Passive example
  • 27. DC 1 DC 2 DB 1 DB 2 Active-Passive example
  • 32. 16 – 17 November, SofiaISTACON.ORG All-active
  • 33. DC 1 DC 2
  • 34. 16 – 17 November, SofiaISTACON.ORG Locality
  • 35. 16 – 17 November, SofiaISTACON.ORG Split traffic in groups
  • 36. 16 – 17 November, SofiaISTACON.ORG Global State
  • 37. mod 2 DC 1 DC 2 user_id = 0 = 1 Partitioning
  • 38. mod 3 DC 1 DC 2 user_id = 0 = 1 DC 3 = 2 Partitioning
  • 39. 16 – 17 November, SofiaISTACON.ORG Very inefficient
  • 40. 16 – 17 November, SofiaISTACON.ORG Consistent hashing DC 1 DC 3 DC 2DC 3
  • 41. 16 – 17 November, SofiaISTACON.ORG Consistent hashing DC 1 DC 3 DC 2DC 3 user_id
  • 42. 16 – 17 November, SofiaISTACON.ORG DNS load balancing
  • 43. 16 – 17 November, SofiaISTACON.ORG DC 1 DC 2 San Francisco Los Angeles New York Toronto
  • 44. 16 – 17 November, SofiaISTACON.ORG DC 1 DC 2 San Francisco Los Angeles New York Toronto
  • 45. 16 – 17 November, SofiaISTACON.ORG Database layer
  • 46. 16 – 17 November, SofiaISTACON.ORG No generic solution
  • 47. 16 – 17 November, SofiaISTACON.ORG Galera Cluster Synchronous multi-master database cluster http://galeracluster.com/
  • 48. 16 – 17 November, SofiaISTACON.ORG DC 1 Master Slave DC 2 Slave Master DC 3 Master Slave DC 4 Slave Master
  • 49. 16 – 17 November, SofiaISTACON.ORG Apache Cassandra http://cassandra.apache.org/
  • 50. 16 – 17 November, SofiaISTACON.ORG Linear scalability Fault-tolerance Commodity hardware
  • 51. 16 – 17 November, SofiaISTACON.ORG Designed for multiple data centers
  • 52. 16 – 17 November, SofiaISTACON.ORG Apache Mesos http://mesos.apache.org/
  • 53. 16 – 17 November, SofiaISTACON.ORG Application Layer
  • 54. 16 – 17 November, SofiaISTACON.ORG Apache Kafka
  • 55. 16 – 17 November, SofiaISTACON.ORG uReplicator https://github.com/uber/uReplicator
  • 56. 16 – 17 November, SofiaISTACON.ORG https://eng.uber.com/ureplicator/
  • 57. 16 – 17 November, SofiaISTACON.ORG https://eng.uber.com/ureplicator/
  • 58. 16 – 17 November, SofiaISTACON.ORG Cherami https://github.com/uber/cherami-server
  • 59. 16 – 17 November, SofiaISTACON.ORG Multi-zone topics Producer Producer Topic Topic Consumer Group Consumer Group replication
  • 60. 16 – 17 November, SofiaISTACON.ORG Multi-zone consumers Producer Topic Topic Consumer Group Consumer Group replication offset sync
  • 61. 16 – 17 November, SofiaISTACON.ORG https://eng.uber.com/cherami/
  • 62. 16 – 17 November, SofiaISTACON.ORG Lessons learned
  • 63. 16 – 17 November, SofiaISTACON.ORG Total dev time Time thinking about failover
  • 64. 16 – 17 November, SofiaISTACON.ORG Total dev time Time thinking about failover
  • 65. 16 – 17 November, SofiaISTACON.ORG Failover testing
  • 66. 16 – 17 November, SofiaISTACON.ORG Failure testing
  • 67. 16 – 17 November, SofiaISTACON.ORG Super smart clients
  • 68. 16 – 17 November, SofiaISTACON.ORG “The best way to avoid failure is to fail constantly. http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html
  • 69. 16 – 17 November, SofiaISTACON.ORG Thank you! @stoitsev Nikolay Stoitsev http://careersinfo.uber.com/sofia-engineering