SlideShare a Scribd company logo
@papa_fire
Troubleshooting with monitoring
Testing in production
DevOps monitoring
[something] testing [something]
monitoring [something] in
production
Leon Fayer
❖ @papa_fire
❖ leon@omniti.com
❖ fayerplay.com
❖ slideshare.net/LeonFayer1
THAT’S ME
WHO AM I?
๏ engineer for 20+ years
๏ professional cynic
๏ @ OmniTI
๏ build and operate big systems
๏ we are hiring!
๏ omniti.com/is/hiring
@papa_fire
I HATE TESTING
@papa_fire
testing is required
@papa_fire
testing is not enough
@papa_fire
> unit testing
> functional testing
> resilience testing
> performance testing
> …
@papa_fire
testing can give a false
sense of security
@papa_fire
testing is deterministic
@papa_fire
data problem
@papa_fire
> quantity of data
> frequency of data
> quality of data
@papa_fire
example
Wolfe+585
@papa_fire
example
Hubert Blaine Wolfeschlegelsteinhausenbergerdorffwelchevoralternwaren-
gewissenhaftschaferswessenschafewarenwohlgepflegeundsorgfaltigkeitbe
schutzenvorangreifendurchihrraubgierigfeindewelchevoralternzwolfhundert
tausendjahresvorandieerscheinenvonderersteerdemenschderraumschiff
genachtmittungsteinundsiebeniridiumelektrischmotorsgebrauchlichtalsseinur
sprungvonkraftgestartseinlangefahrthinzwischensternartigraumaufdersuchen
nachbarschaftdersternwelchegehabtbewohnbarplanetenkreisedrehensichundwo
hinderneuerassevonverstandigmenschlichkeitkonntefortpflanzenundsicher
freuenanlebenslanglichfreudeundruhemitnichteinfurchtvorangreifenvor
andererintelligentgeschopfsvonhinzwischensternartigraum, Sr.
@papa_fire
user problem
@papa_fire
“Users (n) - distributed fault injection
test suite for production
@papa_fire
example
Corrupted Blood bug
@papa_fire
example
@papa_fire
other factors
@papa_fire
> lack of foresight
	 	 	 (Y2K bug)
> too many use-cases
	 	 	 (female Tauren bug)
> change to assumptions
@papa_fire
testing is great for
“known knowns”
@papa_fire
testing is ok for
“known unknowns”
@papa_fire
testing is bad for
“unknown unknowns”
@papa_fire
enter monitoring
@papa_fire
why monitor?
@papa_fire
because testing isn’t
enough
@papa_fire
> software is never perfect
> systems are complex
> external dependency worry
> proactive is better than reactive
> …
@papa_fire
because things change
@papa_fire
because things change
in production
@papa_fire
what to monitor?
@papa_fire
in God we trust all others
we monitor
“
@papa_fire
> systems
> databases
> applications
> integration points
> performance
> user behavior
> …
@papa_fire
is it enough?
@papa_fire
is it too much?
@papa_fire
what is important?
@papa_fire
what is important?
(i.e. what to alert on)
@papa_fire
example
> servers up and running
> HTTP checks return 200
> tweets are lost
@papa_fire
s/system checks/unit tests/
@papa_fire
I don’t give a **** if the
datacenter is on fire as
long as I am still making
money
“ — CEO
@papa_fire
we monitor because things
change
@papa_fire
changes effect business
@papa_fire
top-down approach
> understand business
> define baseline
> correlate data
@papa_fire
example
๏ online marketing company
๏ major e-commerce component
๏ ~100 million users
๏ 1 billion emails/month
๏ 300,000 lines of code
๏5600+ metrics collected
@papa_fire
it all starts with a call …
@papa_fire
revenue
@papa_fire
revenue + traffic
@papa_fire
revenue + traffic + load time
@papa_fire
revenue + traffic + load time + db
@papa_fire
revenue + traffic + load time + db + email
@papa_fire
… email wasn’t monitored?
what if …
@papa_fire
… email wasn’t monitored?
(it would be after this)
what if …
@papa_fire
instrumentation
is never done
@papa_fire
example
> same symptoms
> higher decline rates
> all metrics are within norm
@papa_fire
example
> same symptoms
> higher decline rates
> all metrics are within norm
AmEx blocked
@papa_fire
tl;dr
@papa_fire
testing and monitoring
not
testing or monitoring
@papa_fire
understand the business
@papa_fire
continuous improvement
@papa_fire
{also bad at conclusions}
@papa_fire
THANK YOU
questions?

More Related Content

PDF
BizOps and you
PPTX
Improving DevOps through better monitoring
PDF
Conexão Kinghost - Otimização Prematura
PDF
Premature optimisation: The Root of All Evil
PDF
The Open Commerce Conference - Premature Optimisation: The Root of All Evil
PPTX
Streams on top of scala - #lambdaCon
PDF
Make yourself replaceable at DevOpsCon 2016 Berlin
PDF
Running operations in 2 hours at DevTernity 2015
BizOps and you
Improving DevOps through better monitoring
Conexão Kinghost - Otimização Prematura
Premature optimisation: The Root of All Evil
The Open Commerce Conference - Premature Optimisation: The Root of All Evil
Streams on top of scala - #lambdaCon
Make yourself replaceable at DevOpsCon 2016 Berlin
Running operations in 2 hours at DevTernity 2015

What's hot (16)

PDF
Building Real Time App
PDF
When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
PPTX
Performance tuning
PPTX
Monte Carlo presentation for FM
PDF
Limiting WIP - Music City Tech 2021
KEY
MonoRails - GoGaRuCo 2012
PDF
TDC2016SP - Otimização Prematura: a Raíz de Todo o Mal
PDF
Using the 5 WHYs to determine why your SAP Major Incident really happened
PDF
10 Deploys a Day - A Case Study of Continuous Delivery at Envato
PDF
Building on the Shoulders of Giants: the Story of Bitbucket Pipelines
PDF
DevOps - Be Careful What You Wish For
PDF
Magento Security Best Practises - MM17PL
PPTX
Community IT Webinar - Planning for an Office Move
PPTX
Microservices: Why We Did It (and should you?)
PPTX
Functions.php - It's Not Just For Developers
PPTX
Defining Your Maintenance Strategy
Building Real Time App
When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
Performance tuning
Monte Carlo presentation for FM
Limiting WIP - Music City Tech 2021
MonoRails - GoGaRuCo 2012
TDC2016SP - Otimização Prematura: a Raíz de Todo o Mal
Using the 5 WHYs to determine why your SAP Major Incident really happened
10 Deploys a Day - A Case Study of Continuous Delivery at Envato
Building on the Shoulders of Giants: the Story of Bitbucket Pipelines
DevOps - Be Careful What You Wish For
Magento Security Best Practises - MM17PL
Community IT Webinar - Planning for an Office Move
Microservices: Why We Did It (and should you?)
Functions.php - It's Not Just For Developers
Defining Your Maintenance Strategy
Ad

More from Leon Fayer (13)

PDF
Наследование legacy-систем и процессов, или Первые 90 дней в роли CTO
PDF
What kids can teach us about building effective teams
PDF
Как измерить успех
PDF
Bias in tech
PDF
Developing applications for performance
PDF
Building the right architecture for you
PDF
Lost art of troubleshooting
PDF
Adventures in public speaking
PDF
On call for developers
PPTX
Database performance 101
PPTX
What DevOps is Not
PDF
Breaking social dependency
PPTX
PHP performance 101: so you need to use a database
Наследование legacy-систем и процессов, или Первые 90 дней в роли CTO
What kids can teach us about building effective teams
Как измерить успех
Bias in tech
Developing applications for performance
Building the right architecture for you
Lost art of troubleshooting
Adventures in public speaking
On call for developers
Database performance 101
What DevOps is Not
Breaking social dependency
PHP performance 101: so you need to use a database
Ad

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Electronic commerce courselecture one. Pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Machine learning based COVID-19 study performance prediction
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Spectroscopy.pptx food analysis technology
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Per capita expenditure prediction using model stacking based on satellite ima...
MIND Revenue Release Quarter 2 2025 Press Release
The AUB Centre for AI in Media Proposal.docx
MYSQL Presentation for SQL database connectivity
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Electronic commerce courselecture one. Pdf
NewMind AI Weekly Chronicles - August'25 Week I
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Spectral efficient network and resource selection model in 5G networks
Machine learning based COVID-19 study performance prediction
Reach Out and Touch Someone: Haptics and Empathic Computing
Programs and apps: productivity, graphics, security and other tools
Network Security Unit 5.pdf for BCA BBA.
Digital-Transformation-Roadmap-for-Companies.pptx
Spectroscopy.pptx food analysis technology
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...

Production testing through monitoring