SlideShare a Scribd company logo
Fault Tolerance at Speed
Todd L. Montgomery
@toddlmontgomery
StoneTor
InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
aeron-cluster-raft/
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
About me…
What type of Fault Tolerance?
What is Clustering?
Why Aeron?
Design for Speeding Up?
What type of Fault Tolerance?
What is Clustering?
Why Aeron?
Design for Speeding Up?
Efficiency
https://www.forbes.com/sites/forbestechcouncil/2017/12/15/why-energy-is-a-big-and-rapidly-growing-problem-for-data-centers/#344456665a30
https://www.datacenterdynamics.com/opinions/power-consumption-data-centers-global-problem/
https://www.nature.com/articles/d41586-018-06610-y
We seem to assume
efficiency/security/quality/etc.
is a “special” characteristic added
… later… if at all
Fault Tolerance
Service
Client
Service
Client
Service
Client
ServiceService
Service
Client
ServiceService
Client Client
Service
Client
ServiceService
Client Client
State
Fault Tolerance at Speed
Service ServiceService
State “Storage”
Service
Client
ServiceService
Client Client
State
Fault Tolerance of State
Service ServiceService
State
Partition Replication
Contiguous Log
with
Snapshot & Replay
1
2
3
4
5
6
X
…
1
State
2
3
4
5
6
X
…
1
State
2
3
4
5
6
X
…
Snapshot
1
State
2
3
4
5
6
X
…
Snapshot
5
6
X
…
Snapshot
State
Clustered Services
Service ServiceService
Service ServiceService
Log ArchiveLog Archive Log Archive
Replicated State Machines
https://en.wikipedia.org/wiki/State_machine_replication
Each Replicated Service
Same event log
Same input ordering
Log replicated locally
Replicated State Machines
Checkpoints / Snapshots
Event in the log
“Rolling” up previous log events
Replicated State Machines
When should a service “consume”
(or process) a log event?
Service ServiceService
ArchiveArchive Archive
1 2 3 4 5 6 1 2 3 4 5 6 71 2
Once processed,
Event can not be altered
Only process once event is stable
Raft Consensus
Event must be recorded at majority
of Replicas before being consumed
by any Replica
Replicated State Machines
https://raft.github.io/
Service ServiceService
ArchiveArchive Archive
1 2 3 4 5 6 1 2 3 4 5 6 71 2
Service ServiceService
ArchiveArchive Archive
1 2 3 4 5 6 1 2 3 4 5 6 71 2
Strong Leader
Elected member of the Cluster
Orders Input
Disseminates Consensus
Raft
Service ServiceService
Archive ArchiveArchive
Consensus ConsensusConsensus
Raft is
An algorithm with formal verification
Replicated State Machines
Raft is not
A specification
Nor
A complete system
Replicated State Machines
More than Raft
Leader timestamps events
Async, not RPC-based
Timers
The Real World
Service ServiceService
Archive ArchiveArchive
Consensus ConsensusConsensus
*Leader
Client
Benefits
Determinism
Log is immutable
Log can be played, stopped, & replayed
Each event is timestamped
Services restarted from snapshot & log
Benefits
What Can You Do?
Distributed Key/Value Store
Distributed Timers
Distributed Locks
Matching Engines
Order Management
Market Surveillance
P&L, Risk, …
Finance
Venue Ticketing / Reservations
Auctions
Beyond
Hint - a contended database is a good indicator
Why Aeron?
Efficient reliable UDP unicast, UDP
multicast, and IPC message transport
Java, C/C++, C#, Go
Aeron
https://github.com/real-logic/Aeron
And a little bit more…
Very fast Archival & Replay
Aeron
https://github.com/real-logic/Aeron
The “Efficient” bit…
All communications
Aeron publications & subscriptions
Aeron archival & replay
Aeron shared counters
Consensus
based on Aeron stream position
Batching
Critical to efficient operation
Optimizing pipelined throughput
Flow Control
Critical to correct operation
Design for Efficiency?
Cache Hit/Miss Ratios
Branch Prediction
Allocation Rates
Garbage Collection
Inlining
Optimizations
Not… Yet…
Ownership, Dependency, & Coupling
Complexity
Layers of Abstraction (ain’t free)
Resource Management
Closer… But…
Still. Not. Yet.
"AmdahlsLaw" by Daniels220 at English Wikipedia - Own work based on: File:AmdahlsLaw.png. Licensed under CC BY-SA 3.0 via Wikimedia Commons
Universal Scalability Law
0
2
4
6
8
10
12
14
16
18
20
1 2 4 8 16 32 64 128 256 512 1024
Speedup
Processors
Amdahl USL
Breakdown Interactions
Fundamental Sequential Operations
Ingress Message, Sequence, Disseminate
Client
Follower X
Leader
Ingress
Follower Y
Log (multicast or serial unicast)
Member Status
Log
Event
Log
Event
Followers Append
Client
Follower X
Leader
Ingress
Follower Y
Log (multicast or serial unicast)
Member Status
Append
Position
Append
Position
Commit Message
Client
Follower X
Leader
Ingress
Follower Y
Log (multicast or serial unicast)
Member Status
Commit
Position
Commit
Position
Breakdown Interactions
Pipeline-able Operation & Batching
FollowerLeader
Log (multicast or serial unicast)
Member Status
Commit Position @4096
Append Position @6912
Log Event @8192
Stream Positions
Archive Position @8096 Archive Position @7168
Store locally asynchronous to
Position processing by Consensus, &
Log processing by Service
Batching: Log, Appends, Commits
Doesn’t this Complicate Recovery?
Follower
Recovery Positions
Archive Position @8096 Archive Position @7168
A synchronous system doesn’t make this complexity go away!
Election still needs to assert state of the cluster & locally catch-up
Follower Follower
Archive Position @7584
Commit Position @4096 Commit Position @4064 Commit Position @4032
Service Position @4096 Service Position @4064 Service Position @3776
Limitations of Efficiency
Throughput & Latency
Client FollowersLeader
Ingress
Log (multicast or serial unicast)
Member Status
Commit Position
Append Position
Log Event
Client to Service A: 0.5 RTT
Client to Service Ox: 1 RTT
Client to Service A (on Commit): 1.5 RTT
Client to Service Ox (on Commit): 2 RTT
Constant Delay Network
Service A Service Ox
Round-Trip Time (RTT)
Client to Service A: 50ns
Client to Service Ox: 100ns
Client to Service A (on Commit): 150ns
Client to Service Ox (on Commit): 200ns
Limits from Constant Delay
Shared Memory RTT <100ns
Client to Service A: 50us
Client to Service Ox: 100us
Client to Service A (on Commit): 150us
Client to Service Ox (on Commit): 200us
DC RTT <100us
Client to Service A: 5us
Client to Service Ox: 10us
Client to Service A (on Commit): 15us
Client to Service Ox (on Commit): 20us
Rack (Kernel Bypass) RTT <10us
Measured Latency at Throughput
RTT(us)
0
75
150
225
300
Percentile
Min 0.50 0.90 0.99 0.9999 0.999999 Max
100K msgs/sec 200K msgs/sec
Intel Xeon Gold 5118 (2.30GHz, 12 cores)
32GB DDR4 2400 MHz ECC RAM
Intel Optane SSD 900P Series 480GB
SolarFlare X2522-PLUS 10GbE NIC
All servers are connected to an Arista
7150S
CentOS Linux 7.7, kernel
4.4.195-1.el7.elrepo.x86_64 tuned for
low-latency workload.
Courtesy Mark Price
Single client session, bursts of 20x 200B messages, 3-node cluster, Service(s) echo(es) the payload back.
Takeways
Efficiency is part of design
Power of a timestamped, replicated log
Replicated State Machines
Current Status
Aeron Archiving - fully supported
Aeron Clustering - pre-release
Sponsored by
https://weareadaptive.com/
Fault Tolerance at Speed
Aeron: https://github.com/real-logic/Aeron
Twitter: @toddlmontgomery
Thank You!
Questions?
StoneTor
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
aeron-cluster-raft/

More Related Content

PDF
Service discovery in a microservice architecture using consul
PDF
Consul First Steps
PDF
Excitingly simple multi-path OpenStack networking: LAG-less, L2-less, yet ful...
PDF
Kubernetes &amp; the 12 factor cloud apps
PDF
MongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
PDF
Consul in 5 minutes
PPTX
Introduction to service discovery and self-organizing cluster orchestration. ...
PPTX
Using a Canary Microservice to Validate the Software Delivery Pipeline
Service discovery in a microservice architecture using consul
Consul First Steps
Excitingly simple multi-path OpenStack networking: LAG-less, L2-less, yet ful...
Kubernetes &amp; the 12 factor cloud apps
MongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
Consul in 5 minutes
Introduction to service discovery and self-organizing cluster orchestration. ...
Using a Canary Microservice to Validate the Software Delivery Pipeline

What's hot (20)

PDF
2018 10-31 modern-http_routing-lisa18
PPTX
Building a Cloud Native Service - Docker Meetup Santa Clara (July 20, 2017)
PPTX
Hands-on with Rancher 2.0 and Kubernetes - October 2017 Rancher Online Meetup
PPTX
Introduction to Kubernetes
PDF
Infrastructure development using Consul
PDF
WTF Do We Need a Service Mesh?
PDF
MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
PDF
HAProxyconf 2019 - Criteo - Transitioning from Ticketing to LBaaS
PPTX
Dockerizing the Hard Services: Neutron and Nova
PDF
AstriCon 2017 - Docker Swarm & Asterisk
PPTX
WebSocket MicroService vs. REST Microservice
PPSX
Microservices Docker Kubernetes Istio Kanban DevOps SRE
PPTX
Orchestrating Least Privilege by Diogo Monica
PDF
Series of Unfortunate Netflix Container Events - QConNYC17
PDF
Rancher 2.x first step before deep dive
PPTX
OpenStack Neutron Dragonflow l3 SDNmeetup
PDF
Kubernetes extensibility
PDF
Netflix: From Zero to Production-Ready in Minutes (QCon 2017)
PDF
Kubernetes the Very Hard Way. Lisa Portland 2019
PPTX
Application Modernization with PKS / Kubernetes
2018 10-31 modern-http_routing-lisa18
Building a Cloud Native Service - Docker Meetup Santa Clara (July 20, 2017)
Hands-on with Rancher 2.0 and Kubernetes - October 2017 Rancher Online Meetup
Introduction to Kubernetes
Infrastructure development using Consul
WTF Do We Need a Service Mesh?
MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
HAProxyconf 2019 - Criteo - Transitioning from Ticketing to LBaaS
Dockerizing the Hard Services: Neutron and Nova
AstriCon 2017 - Docker Swarm & Asterisk
WebSocket MicroService vs. REST Microservice
Microservices Docker Kubernetes Istio Kanban DevOps SRE
Orchestrating Least Privilege by Diogo Monica
Series of Unfortunate Netflix Container Events - QConNYC17
Rancher 2.x first step before deep dive
OpenStack Neutron Dragonflow l3 SDNmeetup
Kubernetes extensibility
Netflix: From Zero to Production-Ready in Minutes (QCon 2017)
Kubernetes the Very Hard Way. Lisa Portland 2019
Application Modernization with PKS / Kubernetes
Ad

Similar to Fault Tolerance at Speed (20)

PDF
Developing a Globally Distributed Purging System
PPTX
Principles of High Load - Vilnius January 2015
PDF
Practice and challenges from building IaaS
PDF
John adams talk cloudy
PDF
Velocity 2012 - Learning WebOps the Hard Way
PDF
Synchronous Log Shipping Replication
PDF
Orleans gdc2019
PDF
Logging makes perfect - Riemann, Elasticsearch and friends
PDF
Ten Years of Failing Microservices
PDF
Pileus
PPTX
Asynchronous micro-services and the unified log
PPTX
EXTENT Talks 2019 Tbilisi: Failover and Recovery Test Automation - Ivan Shamrai
PPTX
What to do when detect deadlock
PDF
Microservices: State of the Union
PDF
Microservices Antipatterns
PDF
Linux capacity planning
PDF
Benchmarks, performance, scalability, and capacity what's behind the numbers
PDF
Benchmarks, performance, scalability, and capacity what s behind the numbers...
PDF
We hear you like papers
PDF
What's Missing? Microservices Meetup at Cisco
Developing a Globally Distributed Purging System
Principles of High Load - Vilnius January 2015
Practice and challenges from building IaaS
John adams talk cloudy
Velocity 2012 - Learning WebOps the Hard Way
Synchronous Log Shipping Replication
Orleans gdc2019
Logging makes perfect - Riemann, Elasticsearch and friends
Ten Years of Failing Microservices
Pileus
Asynchronous micro-services and the unified log
EXTENT Talks 2019 Tbilisi: Failover and Recovery Test Automation - Ivan Shamrai
What to do when detect deadlock
Microservices: State of the Union
Microservices Antipatterns
Linux capacity planning
Benchmarks, performance, scalability, and capacity what's behind the numbers
Benchmarks, performance, scalability, and capacity what s behind the numbers...
We hear you like papers
What's Missing? Microservices Meetup at Cisco
Ad

More from C4Media (20)

PDF
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
PDF
Next Generation Client APIs in Envoy Mobile
PDF
Software Teams and Teamwork Trends Report Q1 2020
PDF
Understand the Trade-offs Using Compilers for Java Applications
PDF
Kafka Needs No Keeper
PDF
High Performing Teams Act Like Owners
PDF
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
PDF
Service Meshes- The Ultimate Guide
PDF
Shifting Left with Cloud Native CI/CD
PDF
CI/CD for Machine Learning
PDF
Architectures That Scale Deep - Regaining Control in Deep Systems
PDF
ML in the Browser: Interactive Experiences with Tensorflow.js
PDF
Build Your Own WebAssembly Compiler
PDF
User & Device Identity for Microservices @ Netflix Scale
PDF
Scaling Patterns for Netflix's Edge
PDF
Make Your Electron App Feel at Home Everywhere
PDF
The Talk You've Been Await-ing For
PDF
Future of Data Engineering
PDF
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
PDF
Navigating Complexity: High-performance Delivery and Discovery Teams
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Next Generation Client APIs in Envoy Mobile
Software Teams and Teamwork Trends Report Q1 2020
Understand the Trade-offs Using Compilers for Java Applications
Kafka Needs No Keeper
High Performing Teams Act Like Owners
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Service Meshes- The Ultimate Guide
Shifting Left with Cloud Native CI/CD
CI/CD for Machine Learning
Architectures That Scale Deep - Regaining Control in Deep Systems
ML in the Browser: Interactive Experiences with Tensorflow.js
Build Your Own WebAssembly Compiler
User & Device Identity for Microservices @ Netflix Scale
Scaling Patterns for Netflix's Edge
Make Your Electron App Feel at Home Everywhere
The Talk You've Been Await-ing For
Future of Data Engineering
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Navigating Complexity: High-performance Delivery and Discovery Teams

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
A Presentation on Artificial Intelligence
PDF
cuic standard and advanced reporting.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Machine learning based COVID-19 study performance prediction
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
NewMind AI Weekly Chronicles - August'25 Week I
A Presentation on Artificial Intelligence
cuic standard and advanced reporting.pdf
NewMind AI Monthly Chronicles - July 2025
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Machine learning based COVID-19 study performance prediction
“AI and Expert System Decision Support & Business Intelligence Systems”
Digital-Transformation-Roadmap-for-Companies.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Understanding_Digital_Forensics_Presentation.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...

Fault Tolerance at Speed