SlideShare a Scribd company logo
Monitoring 
Complex 
Systems
I do things to/with 
computers.
I build real-time 
systems.
I build fault-tolerant 
systems.
I build critical 
systems.
AdRoll
Less this.
More this.
Engineering + Mathematics = ads
Engineering + Mathematics = ads 
(you’re welcome)
REAL- T I M E 
BIDDING
The Problem Domain 
• Low latency ( < 100 ms per transaction)
The Problem Domain 
• Low latency ( < 100 ms per transaction) 
• Firm real-time system
The Problem Domain 
• Low latency ( < 100 ms per transaction) 
• Firm real-time system 
• Highly concurrent (90 billion transactions 
per day)
The Problem Domain 
• Low latency ( < 100 ms per transaction) 
• Firm real-time system 
• Highly concurrent (90 billion transactions 
per day) 
• Global, 24/7 operation
I build 
Complex Systems
Complex Systems 
• Non-linear feedback 
• Tightly coupled to external systems 
• Difficult to model, understand
Bad things happen when 
Complex Systems fail.
Humans are bad at predicting the 
performance of complex 
systems(…). Our ability to create 
large and complex systems fools us 
into believing that we’re also 
entitled to understand them. 
CARLOS BUENO 
“MATURE OPTIMIZATION HANDBOOK”
Complex Systems often 
create worse problems 
than those they solve.
The key challenge to 
sustaining a complex 
system is maintaining our 
understanding of it.
What can be done?
Ahead of time verification is 
not sufficient. 
! 
(don’t scrimp on it, though)
Compile-time guarantees 
are not sufficient. 
! 
(don’t scrimp on them, either)
We need insight into the 
running system.
What are we looking for? 
• VM killers
What are we looking for? 
• VM killers 
• Application performance regressions
What are we looking for? 
• VM killers 
• Application performance regressions 
• Abnormal application behavior
What are we looking for? 
• VM killers 
• Application performance regressions 
• Abnormal application behavior 
• Surprises
INSTRUMENTATION
The BEAM is 
ready to ride.
erlang:memory/1
erlang:statistics/1
erlang:system_info/1
What about 
our own work?
Exometer
Important Terms 
metric a measurement 
entry a receiver and aggregator of metrics 
reporter that which samples entries periodically 
and ships them to another system 
subscription the definition of the regular interval on 
which reporters sample entries
These are all loosely 
coupled at runtime.
Configuration is static, but 
you can adapt it on the fly.
Why exometer over 
the alternatives?
It is extensively 
documented.
It’s vigorously 
maintained.
It’s vigorously 
maintained. 
(Ulf Wiger fan club day.)
It is 
silly 
fast.
Okay, great. We have 
instrumentation.
Now what?
MONITORING
This is the 
hard part.
Visualization
Alerting
Analysis
Visualization tells you 
how things look but 
not why.
A good 
day.
Uh oh.
Growth is good, 
but steady is better.
Bids 
are 
stable.
Budgets 
are 
stable.
What happened?
We 
forgot 
about 
Labor 
Day.
Monitoring Complex Systems - Chicago Erlang, 2014
Alerting tells you that 
something happened, 
but not why.
A 
normal 
day.
Wat?
That’s 
some 
cliff.
Timeouts look good.
Errors 
prior 
are 
okay.
What happened?
“Uh, hey guys, you know 
Facebook is down, right?”
Analysis gives you why 
but only if you know 
how to ask for what.
The 
memory 
use of a 
bidder.
ಠ_ಠ
It’s all 
binaries.
Not in processes.
Not in ETS.
Come on now.
What happened?
A jiffy bug.
A jiffy bug. 
(we think)
Shout out to Miriam Pena 
for spending two weeks 
tracking this down.
Okay, great. We 
have monitoring and 
instrumentation.
Now all our problems 
are solved, right?
Not 
quite.
Instruments 
make up for 
our lack of 
insight.
Monitoring 
makes up 
for our 
frailty.
Every 
solution 
brings its 
own 
problems.
Instruments may 
be misleading.
Instruments may be 
overwhelming.
Instruments 
may be 
inaccurate.
Instruments may 
be ignored.
What can 
be done?
A little 
paranoia never hurt 
anyone.
Use glass displays.
Train.
Keep sight of 
the main goal.
Have resources 
you’re willing 
to sacrifice.
AdRoll is Hiring! :D
Thanks, folks! 
<3 
@bltroutwine

More Related Content

PDF
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard World
PDF
Monitoring with exometer at AdRoll
PDF
10 Billion a Day, 100 Milliseconds Per: Monitoring Real-Time Bidding at AdRoll
PDF
Polyglot Persistence in the Real World: Cassandra + S3 + MapReduce
PDF
Hfm tricks and tips 000001
PDF
It Probably Works - QCon 2015
PDF
Stateful streaming data pipelines
PDF
Deadlock
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard World
Monitoring with exometer at AdRoll
10 Billion a Day, 100 Milliseconds Per: Monitoring Real-Time Bidding at AdRoll
Polyglot Persistence in the Real World: Cassandra + S3 + MapReduce
Hfm tricks and tips 000001
It Probably Works - QCon 2015
Stateful streaming data pipelines
Deadlock

Similar to Monitoring Complex Systems - Chicago Erlang, 2014 (20)

PPT
Normal accidents and outpatient surgeries
PPT
Automatic Assessment of Failure Recovery in Erlang Applications
PPTX
Prometheus (Prometheus London, 2016)
PDF
The math behind big systems analysis.
PDF
“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...
PDF
StatsCraft 2015: The problem (Keynote) - Nir Cohen
PDF
Ground rules
PPTX
How I failed to build a runbook automation system
PDF
Observability for Emerging Infra (what got you here won't get you there)
PDF
Data-Driven Software Mastery @Open Mastery Austin
PPTX
Intro to Infinity ubjc;uikcbhi;pnasc;jkcn;lk;nsdc
PPTX
Monitoring Distributed Systems
PDF
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
PDF
The limits of unit testing by Craig Stuntz
PDF
The Limits of Unit Testing by Craig Stuntz
PDF
Stop Getting Crushed By Business Pressure
PPTX
Prometheus - Open Source Forum Japan
PDF
Fault-tolerance on the Cheap: Making Systems That (Probably) Won't Fall Over
PDF
DevOps Paradox: Going Faster Brings Higher Quality, Lower Costs, & Better Out...
PPTX
Monitoring &amp; alerting presentation sabin&amp;mustafa
Normal accidents and outpatient surgeries
Automatic Assessment of Failure Recovery in Erlang Applications
Prometheus (Prometheus London, 2016)
The math behind big systems analysis.
“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...
StatsCraft 2015: The problem (Keynote) - Nir Cohen
Ground rules
How I failed to build a runbook automation system
Observability for Emerging Infra (what got you here won't get you there)
Data-Driven Software Mastery @Open Mastery Austin
Intro to Infinity ubjc;uikcbhi;pnasc;jkcn;lk;nsdc
Monitoring Distributed Systems
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
The limits of unit testing by Craig Stuntz
The Limits of Unit Testing by Craig Stuntz
Stop Getting Crushed By Business Pressure
Prometheus - Open Source Forum Japan
Fault-tolerance on the Cheap: Making Systems That (Probably) Won't Fall Over
DevOps Paradox: Going Faster Brings Higher Quality, Lower Costs, & Better Out...
Monitoring &amp; alerting presentation sabin&amp;mustafa
Ad

More from Brian Troutwine (7)

PDF
(Moonconf 2016) Fetching Moths from the Works: Correctness Methods in Software
PDF
Getting Uphill on a Candle: Crushed Spines, Detached Retinas and One Small Step
PDF
The Charming Genius of the Apollo Guidance Computer
PDF
Let it crash! The Erlang Approach to Building Reliable Services
PDF
Automation With Humans in Mind: Making Complex Systems Predictable, Reliable ...
PDF
Erlang, LFE, Joxa and Elixir: Established and Emerging Languages in the Erlan...
PDF
Instrumentation as a Living Documentation: Teaching Humans About Complex Systems
(Moonconf 2016) Fetching Moths from the Works: Correctness Methods in Software
Getting Uphill on a Candle: Crushed Spines, Detached Retinas and One Small Step
The Charming Genius of the Apollo Guidance Computer
Let it crash! The Erlang Approach to Building Reliable Services
Automation With Humans in Mind: Making Complex Systems Predictable, Reliable ...
Erlang, LFE, Joxa and Elixir: Established and Emerging Languages in the Erlan...
Instrumentation as a Living Documentation: Teaching Humans About Complex Systems
Ad

Recently uploaded (20)

PDF
System and Network Administraation Chapter 3
PDF
top salesforce developer skills in 2025.pdf
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
Transform Your Business with a Software ERP System
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
history of c programming in notes for students .pptx
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
System and Network Administration Chapter 2
PDF
Nekopoi APK 2025 free lastest update
PDF
Digital Strategies for Manufacturing Companies
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
System and Network Administraation Chapter 3
top salesforce developer skills in 2025.pdf
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Transform Your Business with a Software ERP System
Softaken Excel to vCard Converter Software.pdf
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
How to Choose the Right IT Partner for Your Business in Malaysia
history of c programming in notes for students .pptx
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
System and Network Administration Chapter 2
Nekopoi APK 2025 free lastest update
Digital Strategies for Manufacturing Companies
Design an Analysis of Algorithms II-SECS-1021-03
PTS Company Brochure 2025 (1).pdf.......
Navsoft: AI-Powered Business Solutions & Custom Software Development

Monitoring Complex Systems - Chicago Erlang, 2014