SlideShare a Scribd company logo
From Zero To Visibility
Bridget Kromhout
8thbridge.com
small social commerce startup
acquired in the last month by Fluid, Inc.
small devteam
I am the ops team
http://www.thedirtbox.com/wp-content/uploads/2013/01/ping-pongart.jpg
twisty maze of little shell scripts
http://www.pcgameshardware.de/screenshots/1280x1024/2007/07/CA01.jpg
time-consuming to understand
difficult to modify
doesn’t scale
artisanal monitoring?!
http://shop.bespokebacon.com/images/bespoke-logo.final(3).png
New Relic
pros:
nice graphs
application-level view
good error analysis
cons:
slow to update
many false-positive alerts
high prices (better now)
motivating change
http://99designs.
com/illustrations/contests/illustration-
pagerduty-161025/entries
as hideous as you remember
“Horrendous interface”
“Well, it’s more “old” than anything
else. At least everything is in the
same place as you left it because it’s
been the same since 1912.”
https://laur.ie/blog/2014/02/why-ill-be-letting-nagios-live-on-a-bit-longer-thank-you-very-much/
not alone!
“Sensu has so many
moving parts that I
wouldn’t be able to
sleep at night unless
I set up a Nagios
instance to make
sure they were all
running.”
who watches the RabbitMQ?
-- @murphy_slaw (via @lozzd)
http://images.sodahead.com/profiles/0/0/0/5/1/6/6/3/9/Watchmen-trademark-symbol-62141795529.jpeg
http://portertech.ca/images/2011-11-01/sensu-diagram.png
hating on nagios: the middle years
“hadoop does not suffer from a paucity of configuration options”
http://jaganesundar.wordpress.com/2011/12/05/installing-and-configuring-hadoop-0-20-205-using-it-rpm/
monitor all the ports?!
best way to monitor HBase:
hbck: the HBase consistency checker
nagios -> bash script -> parsing output of hbck
http://www.ymc.ch/en/how-to-monitor-hbase-health-by-nagios
http://modiinhub.com/wp-content/uploads/2014/02/logo-mongodb-tagline.png
From Zero To Visibility
“Cyber” monday: 1988 called; wants its word back.
wow. such nosql. very webscale.
“a single write operation holds the lock exclusively, and
no other read or write operations may share the lock.”
“If it moves, we track it. Sometimes we’ll
draw a graph of something that isn’t
moving yet, just in case it decides to make
a run for it.” Ian Malpass, Etsy
http://codeascraft.com/2011/02/15/measure-anything-measure-everything/
the (former) state of our graphite & statsd
● Graphite 0.9.9
○ hand-rolled
○ over 2 years old
○ missing new features (Consolidate by!)
● StatsD was newish, but…
○ hand-rolled
○ running in a screen session
○ on a special snowflake box
http://media-cache-ec0.pinimg.com/736x/68/c2/9d/68c29deb72bad94cd4e3c1aa0f3cdcd8.jpg
this is wrong tool. never use this.
Community cookbooks?
● StatsD
○ https://github.com/librato/statsd-cookbook
● Graphite ones good, but…
○ focus on Apache (we use nginx)
○ we haven’t moved to Chef 11 (gasp!)
when in doubt: tcpdump is your friend
http://blog.johngoulah.com/2012/10/looking-under-the-covers-of-statsd/
carbon-aggravator (between 0.9.10 & 0.9.12)
# If set true, metric received will be forwarded to
# DESTINATIONS in addition to
# the output of the aggregation rules. If set false
# the carbon-aggregator will
# only ever send the output of aggregation.
FORWARD_ALL = True
carbonate: A+++ would clone again
whisper-fill.py
backfill datapoints between whisper files
life as a third wheel party
thresholds: because not every outage is abrupt
normal traffic
decision
to turn off
decision
to turn
back on
accidental removal
open-source error reporting
all the things
StatsD
Application-level error
analysis
Alarms for autoscaling
Timers &
counters
Log & host-level
Hadoop & HBase
visualization
MongoDB
Graphs
Time-series
data graphing
client-side
plugins
Threshold-based alarmsDashboard
external checks
What’s next?
http://blog.xebia.fr/wp-content/uploads/2013/12/file-logstash-es-kibana.png
what even is ideal monitoring solution
http://www.quickmeme.com/img/f5/f512ff9bee084263df5571d3c81388019dcb063173e1dbcfa2babac9274576b6.jpg
❏ finds real problems
❏ actionable alerting
❏ usable by all
❏ …?
questions; comments; whatnot
Twitter: @bridgetkromhout
Email: bridget@kromhout.org
In person: DevOps Days Minneapolis
(devopsdays.org)

More Related Content

PDF
HTML GL - 60 FPS and amazing effects by rendering HTML/CSS in WebGL, framewor...
PDF
Migrating your Web app to Virtual Reality
PDF
HTML GL - возьмите столько FPS, сколько вам нужно!
PDF
To always be shipping (SPS)
PPTX
Framework/API - CocoaHeads SP
PDF
Monitoring at a SAAS Startup: Tradeoffs and Tools
PPTX
Open Source Monitoring Tools
PDF
How to measure everything - a million metrics per second with minimal develop...
HTML GL - 60 FPS and amazing effects by rendering HTML/CSS in WebGL, framewor...
Migrating your Web app to Virtual Reality
HTML GL - возьмите столько FPS, сколько вам нужно!
To always be shipping (SPS)
Framework/API - CocoaHeads SP
Monitoring at a SAAS Startup: Tradeoffs and Tools
Open Source Monitoring Tools
How to measure everything - a million metrics per second with minimal develop...

Similar to From Zero To Visibility (20)

PDF
OSDC 2014: Devdas Bhagat - Graphite: Graphs for the modern age
KEY
Trending with Purpose
PDF
OSMC 2014 | Time to say goodbye to your Nagios based setup? by Oliver Jan
PPTX
Time to say goodbye to your Nagios based setup
PDF
OSMC 2014: Time to say goodbye to your Nagios setup | Oliver Jan
PDF
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
PDF
Handout: 'Open Source Tools & Resources'
PDF
Monitoring as Software Validation
KEY
London devops logging
PDF
Highly Available Graphite
PDF
Monitoring Big Data Systems - "The Simple Way"
PDF
app/server monitoring
PDF
StatsD DevOps Boulder 7/20/15
PPTX
100X Investigations - Graphistry / Microsoft BlueHat
PPTX
StasD & Graphite - Measure anything, Measure Everything
PDF
The Art of Monitoring (2016).pdf
PDF
Rethinking metrics: metrics 2.0 @ Lisa 2014
PDF
Monitoring in the cloud with Puppet
PPTX
Graphite
ODP
Monitoring your VM's at Scale
OSDC 2014: Devdas Bhagat - Graphite: Graphs for the modern age
Trending with Purpose
OSMC 2014 | Time to say goodbye to your Nagios based setup? by Oliver Jan
Time to say goodbye to your Nagios based setup
OSMC 2014: Time to say goodbye to your Nagios setup | Oliver Jan
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
Handout: 'Open Source Tools & Resources'
Monitoring as Software Validation
London devops logging
Highly Available Graphite
Monitoring Big Data Systems - "The Simple Way"
app/server monitoring
StatsD DevOps Boulder 7/20/15
100X Investigations - Graphistry / Microsoft BlueHat
StasD & Graphite - Measure anything, Measure Everything
The Art of Monitoring (2016).pdf
Rethinking metrics: metrics 2.0 @ Lisa 2014
Monitoring in the cloud with Puppet
Graphite
Monitoring your VM's at Scale
Ad

More from bridgetkromhout (20)

PDF
An introduction to Helm - KubeCon EU 2020
PDF
Join Our Party: The Cloud Native Adventure Brigade (Kubernetes Belgium 2019)
PDF
devops, distributed (devopsdays Ghent 2019)
PDF
Join Our Party: The Cloud Native Adventure Brigade (devopsdays Philly 2019)
PDF
Join Our Party: The Cloud Native Adventure Brigade (TCSW 2019)
PDF
Increasing Reliability via Helm Pre-Release Checks (Helm Summit 2019)
PDF
Kubernetes for the Impatient (devopsdays Cape Town 2019)
PDF
Join Our Party: The Cloud Native Adventure Brigade (OSS 2019)
PDF
Helm 3: Navigating To Distant Shores (OSS NA 2019)
PDF
Helm 3: Navigating to Distant Shores (OSCON 2019)
PDF
Kubernetes for the Impatient (Velocity San Jose 2019)
PDF
Community projects inform enterprise products (Velocity San Jose 2019)
PDF
Helm 3: Navigating to Distant Shores (KubeCon EU 2019)
PDF
Kubernetes Operability Tooling (GOTO Chicago 2019)
PDF
Kubernetes Operability Tooling (Minnebar 2019)
PDF
Livetweeting Tech Conferences - SREcon Americas 2019
PDF
Kubernetes Operability Tooling (devopsdays Seattle 2019)
PDF
Kubernetes Operability Tooling (LEAP 2019)
PDF
Day 2 Kubernetes - Tools for Operability (KubeCon)
PDF
Cloud, Containers, Kubernetes (YOW Melbourne 2018)
An introduction to Helm - KubeCon EU 2020
Join Our Party: The Cloud Native Adventure Brigade (Kubernetes Belgium 2019)
devops, distributed (devopsdays Ghent 2019)
Join Our Party: The Cloud Native Adventure Brigade (devopsdays Philly 2019)
Join Our Party: The Cloud Native Adventure Brigade (TCSW 2019)
Increasing Reliability via Helm Pre-Release Checks (Helm Summit 2019)
Kubernetes for the Impatient (devopsdays Cape Town 2019)
Join Our Party: The Cloud Native Adventure Brigade (OSS 2019)
Helm 3: Navigating To Distant Shores (OSS NA 2019)
Helm 3: Navigating to Distant Shores (OSCON 2019)
Kubernetes for the Impatient (Velocity San Jose 2019)
Community projects inform enterprise products (Velocity San Jose 2019)
Helm 3: Navigating to Distant Shores (KubeCon EU 2019)
Kubernetes Operability Tooling (GOTO Chicago 2019)
Kubernetes Operability Tooling (Minnebar 2019)
Livetweeting Tech Conferences - SREcon Americas 2019
Kubernetes Operability Tooling (devopsdays Seattle 2019)
Kubernetes Operability Tooling (LEAP 2019)
Day 2 Kubernetes - Tools for Operability (KubeCon)
Cloud, Containers, Kubernetes (YOW Melbourne 2018)
Ad

Recently uploaded (20)

PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
OMC Textile Division Presentation 2021.pptx
PPTX
TLE Review Electricity (Electricity).pptx
PDF
August Patch Tuesday
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
Enhancing emotion recognition model for a student engagement use case through...
Accuracy of neural networks in brain wave diagnosis of schizophrenia
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Univ-Connecticut-ChatGPT-Presentaion.pdf
A Presentation on Artificial Intelligence
Unlocking AI with Model Context Protocol (MCP)
1 - Historical Antecedents, Social Consideration.pdf
A comparative study of natural language inference in Swahili using monolingua...
Heart disease approach using modified random forest and particle swarm optimi...
Web App vs Mobile App What Should You Build First.pdf
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Encapsulation_ Review paper, used for researhc scholars
OMC Textile Division Presentation 2021.pptx
TLE Review Electricity (Electricity).pptx
August Patch Tuesday
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Hindi spoken digit analysis for native and non-native speakers
Zenith AI: Advanced Artificial Intelligence
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Digital-Transformation-Roadmap-for-Companies.pptx

From Zero To Visibility