SlideShare a Scribd company logo
Making Operations Visible

Nick Galbreath  ニック ガルブレス

DevOpsDays Tokyo 2013
#devopsdays
@ngalbreath
nickg@client9.com
ngalbreath@iponweb.net
http://slidesha.re/1h9Aqye
http://www.client9.com/
It's also on video!

http://bit.ly/1gaEmDS
Who is nickg?

Nick Galbreath

http://client9.com/20130501
@ngalbreath
www.client9.com
Online Advertising Infrastructure
オンライン広告 インフラ

東京
ロシヤ モスクワ

http://www.iponweb.jp/
Continuous
Deployment
• In 2012, I spoke many times on
continuous deployment.

• But changing from release cycles to

continuous deployment is too big a
change for most organization, and they
don't have the tools to do it.
Goal
• I'm hoping that adding new metrics to
the application becomes so addictive
that you'll want to shorten release
cycles.
What is DevOps?
• Puppet, Chef, Annsible?
• GitHub? AWS? The Cloud?
• Continuous Deployment?
Yes, but these are tools. Great tools.
It's About
Communication
• Between machines
• Between team members
• Between Dev and Ops
But in many companies there is a bigger problem
You're Invisible
• If you are in Business, you are

invisible to Development and Tech
Operations

• If you are in Operations, you are
invisible to Business and
Development

• If you are in Development, you are

invisible to Business and Operations.
Invisible Things
Aren't Valued
Developer
• "I don't know what my code will do in

production and ops and let's them deal
with it.

• "Why doesn't ops fix these problems."
• "What does Ops do all day?"
Business
• Why do I have to wait till end of the
month for a report?

• "Did the last weeks release change
anything?"

• "What don't they understand the impact
of that bug, outage, etc?"
Operations
• Why are they always bothering me.
• I've got work to do!
• Why do we have do another release
again... can't developers do a better
job?

• "What does this company do?" (really)
This is really destructive
To you
To your Team
To your company.
All of This
Can Fixed By Making
Operations Visible
with data
Not just technical operations but
company operations.
Your company is full
of data!
So Why Not Expose
This Data?
Here's a list of excuses I've heard
"But I already have
graphing in my
alerting system"
• Maybe. But it's junk
• Can't share
• Can't do data mash-ups
• Can't do data transformations
"They wouldn't
understand."
• "They won't understand the data so
what's the point of sharing it."

• First, "they" probably do. And more
people looking at ops metrics, the
better.

• Us vs. Them = Fail.
"They might break
something."
• "The data is in our alerting system, we
don't want you to break it."

• Assumes "they" are incompetent, or
malicious. Learn to trust.
"It's not your job,
so you don't need to
know."
"That information isn't
important"
• This excuse is typically caused by fear.
• Why are you deciding what's important?
"I'm not making
another system,
duplicating data is bad."
• For operational metrics is very ok

to have a redundant copy of data.

• Completely different goals.
• Use as alerting-beta
"I'm too busy."
"It's too dangerous"
"I don't know how."
• These are real problems.
• So let's fix it!
One Machine,
One Day,
One Person
Challenge!
Let's get 100% of operational metrics in,
and enable the application to make and
share new metrics on demand
without any help from you.
Graphite
• https://github.com/graphite-project
• http://graphite.readthedocs.org/

• Similar to RRDTool, Ganglia, Cacti
• Uses specialized data storage
• Uses specialized queries
• Optimized for time series
Graphite isn't Perfect
• Documentation isn't great
(but getting better)

• A few QA issues
• Somewhat odd stack

(python-twisted, django)
Graphite Ecosystem
• Flexible input and output
• REST API for graphs
• Simple UI for mashups and dashboards
• 3rd party, custom, client-side
dashboards
Makes Sharing Easy
• Do you have an interesting graph?

It's

just a URL!

• Dashboards are easy since graphs are
just URLs. Very easy to make HTML
dashboards.
One Machine
One Day!
• A single low-end machine should have

capacity for a few thousand metrics per
minute from 50+ machines.

• Graphite is not CPU intensive, but

needs fast disks and/or more memory.
One Day,
One Person
• Graphite is not hard to install, but it is a
bit messy.

• But might be as easy as

"apt-get install graphite" on your
system.

• It would be good to have a workshop
or prebuilt AMI for EC2

• But not today :-(
Operational Stats
• You could parse /proc, ps, df,

netstat, etc and write your own
custom scripts....

• ...or use Diamond from BrightCove
•https://github.com/
BrightcoveOS/Diamond
Metrics in Diamond now
• Memory
• CPU
• Disk
• Network

• Apache
• NGINX
• MySQL
• SNMP

and many more
100% of pure operational metrics are now shared!

But what about the
your applications?
And business metrics?
Enter StatsD
• https://github.com/etsy/statsd

• Your application sends event data to
statsd, as it happens, in real-time.

• StatsD collects this data and computes
time-series metrics
(sum, min, max, average)

• Once a minute, it writes data to
Graphite
The Magic of UDP
• Your application sends metrics in a
UDP packet.

• UDP is error-free. No exceptions, No
timeouts. It can not cause your
application to crash

• It will not overload your network.
• You may lose metrics, but in an
intranet, it's rare.
Let's Count Logins!
• Most StatsD client APIs are
one-file, no C, simple.

• Add one line to your login code.
StatsD::increment('logins');

• That's it!
Events!
• You can also graph low-frequency
events.

• Just send another StatsD request in
your batch script

StatsD::increment("deploy", 1);

• Do it on reboots, installs, core dumps.
• New bugs, new hires, new code
commits.

• Use drawAsInfinite to display
Server

login,1

Server

login,1

Server

login,1

StatsD

deploy,1

(login,3), (deploy,1)

Deploy Script
Graphite
Measure Anything, Measure Everything
http://codeascraft.com/2011/02/15/measure-anything-measure-everything/
Logins By Country!
• get country code from IP address
• make a new metric
"login_country" instantly

StatsD::increment('logins');
$kuni = geoip2country($ipv4);
StatsD::increment('logins.$kuni');
Make Dashboards

• and make frameworks to make new
dashboards, easy.
Default Dashboard
Good for experiments
Making operations visible - Nick Gallbreath
Dashboards

Make it easy for your customers
Make
Operations
Visible
• Make the company visible.
• Enable communication
• Do the

One Machine, One Day, One Person
Challenge!
Thanks!
DevOpsDays Tokyo 2013
DevOpsDays
is on video!

Tokyo 2013

• The entire event is
http://vimeo.com/album/2559722
http://vimeo.com/album/2559722
DevOpsDays
Tokyo 2013
Media Coverage
•

http://itpro.nikkeibp.co.jp/article/NEWS/20130930/507682/

•

http://itpro.nikkeibp.co.jp/article/NEWS/20130930/507755/

•

http://itpro.nikkeibp.co.jp/article/NEWS/20131001/507959/

•

http://www.publickey1.jp/blog/13/devopsdevops_day_tokyo_2013.html

•

http://www.publickey1.jp/blog/13/devopsdevops_day_tokyo_2013_1.html

•

http://www.publickey1.jp/blog/13/
githubdevopsboxenhubotdevops_day_tokyo_2013.html

•

http://www.publickey1.jp/blog/13/
githubboxenhubotdevops_day_tokyo_2013.html

•

http://www.publickey1.jp/blog/13/devopsdevops_day_tokyo_2013_2.html
DevOpsDays
Tokyo 2013
Attendee Coverage
•
•
•
•

http://mass.hatenablog.com/entry/2013/09/28/205309

•
•

http://toshi-miura.hatenablog.com/entry/2013/09/29/222609

•

http://codezine.jp/article/detail/7438

http://d.hatena.ne.jp/n-sega/20130928/1380373634
http://kazuph.hateblo.jp/entry/2013/09/28/152302
http://jedipunkz.github.io/blog/2013/09/29/devops-day-tokyo-2013report/

http://lewuathe.github.io/blog/2013/09/28/devopsday-tokyo-2013nixingtutekitayo/

More Related Content

PDF
Takeaways From Migrating to Git and Bitbucket Server
PPTX
DevOps Pipelines and Metrics Driven Feedback Loops
PPTX
Boston DevOps Days 2016: Implementing Metrics Driven DevOps - Why and How
PPTX
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
PPTX
(R)evolutionize APM
PPTX
How to keep you out of the News: Web and End-to-End Performance Tips
PPTX
Top .NET, Java & Web Performance Mistakes - Meetup Jan 2015
PDF
PXL Data Engineering Workshop By Selligent
Takeaways From Migrating to Git and Bitbucket Server
DevOps Pipelines and Metrics Driven Feedback Loops
Boston DevOps Days 2016: Implementing Metrics Driven DevOps - Why and How
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
(R)evolutionize APM
How to keep you out of the News: Web and End-to-End Performance Tips
Top .NET, Java & Web Performance Mistakes - Meetup Jan 2015
PXL Data Engineering Workshop By Selligent

What's hot (20)

PPTX
Performance tuning
PPTX
OOP 2016 - Building Software That Eats The World
PDF
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
PPTX
How to explain DevOps to your mom
PPTX
Web and App Performance: Top Problems to avoid to keep you out of the News
PPTX
HSPS 2015 - SharePoint Performance Santiy Checks
PDF
Scaffolding a legacy app with BDD scenarios using SpecFlow/Cucumber (BDD Lond...
PPTX
London WebPerf Meetup: End-To-End Performance Problems
PPTX
Hugs instead of Bugs: Dreaming of Quality Tools for Devs and Testers
PDF
Understanding DataOps and Its Impact on Application Quality
PDF
Metrics driven development 10.09.2014
PPTX
Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...
PDF
Scaffolding a legacy app with BDD scenarios using SpecFlow/Cucumber (HUSTEF 2...
PDF
AtlasCamp 2015: Getting your Connect add-on over the finish line
PPTX
Atlassian User Group NYC April 27 2017 Presentations
PDF
O365Engage17 - Managing share point online end to-end with powershell
PPTX
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
PPTX
Mobile User Experience: Auto Drive through Performance Metrics
PPTX
Maintainable Machine Learning Products
PDF
Slam Dunk with Splunk and Stash Data Center
Performance tuning
OOP 2016 - Building Software That Eats The World
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
How to explain DevOps to your mom
Web and App Performance: Top Problems to avoid to keep you out of the News
HSPS 2015 - SharePoint Performance Santiy Checks
Scaffolding a legacy app with BDD scenarios using SpecFlow/Cucumber (BDD Lond...
London WebPerf Meetup: End-To-End Performance Problems
Hugs instead of Bugs: Dreaming of Quality Tools for Devs and Testers
Understanding DataOps and Its Impact on Application Quality
Metrics driven development 10.09.2014
Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...
Scaffolding a legacy app with BDD scenarios using SpecFlow/Cucumber (HUSTEF 2...
AtlasCamp 2015: Getting your Connect add-on over the finish line
Atlassian User Group NYC April 27 2017 Presentations
O365Engage17 - Managing share point online end to-end with powershell
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Mobile User Experience: Auto Drive through Performance Metrics
Maintainable Machine Learning Products
Slam Dunk with Splunk and Stash Data Center
Ad

Similar to Making operations visible - Nick Gallbreath (20)

PDF
Introduction to GraphQL (or How I Learned to Stop Worrying about REST APIs)
PPTX
Azure DevOps Realtime Work Item Sync: the good, the bad, the ugly!
PPTX
From Duke of DevOps to Queen of Chaos - Api days 2018
PDF
How Celtra Optimizes its Advertising Platform with Databricks
PDF
Big Data at a Gaming Company: Spil Games
PDF
Graphite tattle
PPTX
Washington DC DataOps Meetup -- Nov 2019
PPTX
Sync Workitems between multiple Team Projects #vssatpn
PPTX
Making sense of microservices, service mesh, and serverless
PPTX
Ship code like a keptn
PPTX
Data Engineer's Lunch #68: DevOps Fundamentals
PDF
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
PDF
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
PDF
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
PDF
Join 2017_Deep Dive_Workflows with Zapier
PPTX
Top Business Intelligence Trends for 2016 by Panorama Software
PDF
Easy oracle & weblogic provisioning and deployment
PDF
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
PPTX
What is spatial sql
PDF
Before vs After: Redesigning a Website to be Useful and Informative for Devel...
Introduction to GraphQL (or How I Learned to Stop Worrying about REST APIs)
Azure DevOps Realtime Work Item Sync: the good, the bad, the ugly!
From Duke of DevOps to Queen of Chaos - Api days 2018
How Celtra Optimizes its Advertising Platform with Databricks
Big Data at a Gaming Company: Spil Games
Graphite tattle
Washington DC DataOps Meetup -- Nov 2019
Sync Workitems between multiple Team Projects #vssatpn
Making sense of microservices, service mesh, and serverless
Ship code like a keptn
Data Engineer's Lunch #68: DevOps Fundamentals
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
Join 2017_Deep Dive_Workflows with Zapier
Top Business Intelligence Trends for 2016 by Panorama Software
Easy oracle & weblogic provisioning and deployment
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
What is spatial sql
Before vs After: Redesigning a Website to be Useful and Informative for Devel...
Ad

More from Devopsdays (20)

PDF
Dev opsdays scriptcode
PDF
Zero to hero - Geoff Webb
PDF
Letting go gavin - Mc Donald
PDF
Dw tpain - Gordon Klok
PDF
Dev ops finishes what agile started - Manfred Moser
PDF
Game of thrones - Jonathan Thorpe
PDF
Heka - Rob Miller
PDF
Gaming dev ops - Eduardo Saito
PDF
From the classroom to the cloud a journey with node.js - Christopher Hogue
PDF
Dev ops at mobify - Kyle Young
PDF
Your business needs devops, so don’t follow - Brian johnson
PDF
Test kitchen 1.0 - Fletcher Nichol
PDF
Living system or build factory - Chris Maxwell
PDF
From vagrant to production - Mark Eijsermans
PDF
Dev ops lessons learned - Michael Collins
PDF
Building for operations - Reinhardt Quelle
PDF
Taking devops to the Next Level - Max Martin
PDF
Sensu intro - Sean Porter
PDF
Ops for everyone - John Britton
PDF
Effective monitoring with statsd - Alexis lê-quôc
Dev opsdays scriptcode
Zero to hero - Geoff Webb
Letting go gavin - Mc Donald
Dw tpain - Gordon Klok
Dev ops finishes what agile started - Manfred Moser
Game of thrones - Jonathan Thorpe
Heka - Rob Miller
Gaming dev ops - Eduardo Saito
From the classroom to the cloud a journey with node.js - Christopher Hogue
Dev ops at mobify - Kyle Young
Your business needs devops, so don’t follow - Brian johnson
Test kitchen 1.0 - Fletcher Nichol
Living system or build factory - Chris Maxwell
From vagrant to production - Mark Eijsermans
Dev ops lessons learned - Michael Collins
Building for operations - Reinhardt Quelle
Taking devops to the Next Level - Max Martin
Sensu intro - Sean Porter
Ops for everyone - John Britton
Effective monitoring with statsd - Alexis lê-quôc

Recently uploaded (20)

PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
cuic standard and advanced reporting.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
Electronic commerce courselecture one. Pdf
GamePlan Trading System Review: Professional Trader's Honest Take
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Machine learning based COVID-19 study performance prediction
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
cuic standard and advanced reporting.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
The Rise and Fall of 3GPP – Time for a Sabbatical?
Unlocking AI with Model Context Protocol (MCP)
Advanced Soft Computing BINUS July 2025.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Diabetes mellitus diagnosis method based random forest with bat algorithm
Spectral efficient network and resource selection model in 5G networks
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy

Making operations visible - Nick Gallbreath