SlideShare a Scribd company logo
HOWTO MEASURE EVERYTHING
A million metrics per second with minimal developer overhead	

!
Jos Boumans - @jiboumans
http://www.imagemediapartners.com/Portals/20286/images/MeasuringTape-s.jpg
RIPE NCC
Engineering manager for RIPE Database
http://www.ripe.net/db
CANONICAL
http://lukeroberts.deviantart.com/art/Destroy-Ubuntu-93235775
Engineering manager for Ubuntu Server 10.04 & 10.10
http://www.ubuntu.com/business/server/overview
KRUX
VP of Operations & Infrastructure
http://www.krux.com/
SOME OF OUR CUSTOMERS
A LOT OFTRAFFIC
http://www.americapictures.net/buenos-aires-traffic-city-night-argentina.html
AVERAGE DATA EVENTS / SEC
http://investor.fb.com/results.cfm
http://www.statisticbrain.com/twitter-statistics/
http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm
0 35,000 70,000 105,000 140,000
Twitter: NewTweets Wikipedia: PageViews
Facebook: Messages Sent Krux: New Data Points
MONTHLY UNIQUE USERS
0 500,000,000 1,000,000,000 1,500,000,000 2,000,000,000
http://reportcard.wmflabs.org/
http://www.statisticbrain.com/twitter-statistics/
http://newsroom.fb.com/company-info/
DATA IS EVERYTHING
Always know what’s going on
http://perpetual-wonder.com/blog/wp-content/uploads/2012/09/Where-do-we-go-from-here.jpg
UNIQUE METRICS
Unique metrics received, per second
METRICS &VISUALIZATION
… and a little bit of monitoring
http://getfit101.files.wordpress.com/2012/04/visualization.jpg
VISUALIZATION MATTERS
Humans are good at patterns & shapes
http://1.bp.blogspot.com/-CO-8FK9bohE/T89rD8dTyEI/AAAAAAAAAEE/YUZ00v_filk/s1600/live_like_it_matters_by_mythirll-d3iqcxt.jpg
INSIGHT MATTERS
We consider it a core competence
http://yourselfseries.com/teens/files/2013/05/suicide_bonus_Insight_final.jpg
SHOW EVERYONE
And better yet, encourage people to add their own
http://www.kissimmee.org/ftp/KCC/events/views/images/crowd_cheer.jpg
THE BOTTOM LINE
KEY CHARACTERISTICS
… of our metrics collection
http://www.fullcirclefeedback.com.au/resources/wp-content/uploads/2014/01/Key-skills-and-characteristics-of-good-HR-leaders.jpg
WHATTOVISUALIZE
Pick your operational KPIs
http://1.bp.blogspot.com/-nrB1A9hamEk/UVZui_JUG1I/AAAAAAAAAdI/zGqHuanZNVU/s1600/missed-opportunities.jpg
REQUEST & ERROR RATES
The baseline for everything else
WORST RESPONSETIMES
Track the worst upper 95th & upper 99th across a cluster
TRACK EVENTS
Did a code change or batch job cause a change in
behaviour?
CAPACITY /THRESHOLDS
How much traffic can your service sustain?
SINGLE SERVICE OVERVIEW
Create a single graph for every service
WHATTO CAPTURE
Everything.	

No, really.
http://arkansasagnews.uark.edu/monarchs95.jpg
INFRASTRUCTURE
Everything needed to create, capture and 	

act on a million metrics per seconds
http://discussamerica.org/remer-blog/images/Freeway_Interchange2.jpg
GRAPHITE, STATSD & COLLECTD
TheTrifecta
COLLECTD
Open Source MonitoringTool
https://collectd.org/
https://collectd.org/wiki/index.php/Plugin:StatsD
STATSD
Simple stats collector service
https://github.com/etsy/statsd
http://codeascraft.com/2011/02/15/measure-anything-measure-everything/
https://wwwx.cs.unc.edu/~sparkst/howto/network_tuning.phphttp://emps.exeter.ac.uk/media/universityofexeter/emps/eisa/exista-splash.jpg
STATSD NAMING SCHEME
stats. # to distinguish from events	
$environment. # prod, dev, etc	
$cluster_name. # api-ash, www-dub, etc	
$application. # webapp, login, etc	
$metric_name_here. # any key the app wants	
$hostname # node the stat came from
STATSD CONFIGURATION
{ graphite: {	

globalPrefix: stats.$env.$cluster_name,	

globalSuffix: require(‘os').hostname().split('.')[0],	

legacyNamespace: false,	

},	

percentThreshold: [ 95, 99 ],	

deleteIdleStats: true,	

}
https://github.com/etsy/statsd/blob/master/exampleConfig.js
GRAPHITE
Metric store & Graph UI
http://graphite.wikidot.com/
http://graphite.readthedocs.org/en/latest/
GRAPHITE SETUP
At least one graphite server per data center
DATA RETENTION
[default]	

pattern = .*	

priority = 110	

retentions = 10:6h,60:15d,600:5y	

xFilesFactor = 0
http://graphite.readthedocs.org/en/latest/config-carbon.html#storage-schemas-conf
STANDARD AGGREGATIONS
# Average & Sum for timers	

<prefix>.timers.<key>._totals.ash.<type>.avg (10) = 	

	

 avg <<prefix>>.timers.<<key>>.<node>.<type>	

!
<prefix>.timers.<key>._totals.ash.<type>.sum (10) = 	

	

 sum <<prefix>>.timers.<<key>>.<node>.(?!upper|lower)<type>	

!
# Min / Max for Lower / Upper	

<prefix>.timers.<key>._totals.ash.upper (10) = 	

	

 max <<prefix>>.timers.<<key>>.<node>.upper
!
<prefix>.timers.<key>._totals.ash.lower (10) = 	

	

 min <<prefix>>.timers.<<key>>.<node>.lower
http://graphite.readthedocs.org/en/latest/config-carbon.html#aggregation-rules-conf
PERFORMANCE
First problem: IOPS	

Second problem: CPU
http://www.organisationscience.com/styled-6/files/dt-improved-performance.jpg
GRAPHITE ALTERNATIVES
Circonus:All the insights you ever wanted	

Zabbix: OSS self hosted monitoring http://circonus.com
http://zabbix.com
https://github.com/lyft/circonus-statsd-backend
https://github.com/dlecocq/statsd-zabbix
GRAPHITE.JS
Custom dashboards using jQuery
https://github.com/prestontimmons/graphitejs
http://dashboarddude.com/blog/2013/01/23/dashboards-for-graphite/
COST
Optimize for adoption rates in your organization by
eliminating cost as a constraint
http://www.examiner.com/images/blog/wysiwyg/image/money].jpg
INSTRUMENTATION
Instrument your infrastructure, not just your apps
http://2.bp.blogspot.com/-bL9D8VMtor4/TiNBDEJmvOI/AAAAAAAAByc/Y0Uc3GVPNl0/s400/SeminaGestaoPessoasOrquestraROB4428.jpg
APACHE
Use mod_statsd to capture stats 	

directly from the Apache request
http://kaleidos.net/files/images/apache318x260.png
http://httpd.apache.org/
https://github.com/jib/mod_statsd
BASIC CONFIGURATION
<Location /api>	

Statsd On	

StatsdPrefix apache 	

</Location>
https://github.com/jib/mod_statsd/blob/master/DOCUMENTATION
$ curl http://localhost/api/foo?id=42	

!
Stat: apache.api.foo.GET.200:31|ms
VARNISH
use libvmod-statsd & libvmod-timers to capture 	

stats directly from theVarnish request
http://www.adammalone.net/sites/default/files/styles/blog_image/public/varnish-bunny.png?itok=1bBDTA1A
https://www.varnish-cache.org/
https://github.com/jib/libvmod-statsd
BASIC CONFIGURATION
# pseudo code	
import statsd; import timers;	
sub vcl_deliver {	
statsd.timing(	
$backend + # from req.backend	
$hit_miss + # from obj.hits	
$resp_code, # from obj.status	
timers. req_response_time() );	
}
https://github.com/jib/libvmod-statsd/blob/master/README.rst
http://jiboumans.wordpress.com/2013/02/27/realtime-stats-from-varnish/
SAMPLE GRAPH
The request per second & response time graphs 	

are coming straight from varnish
PYTHON
Create a base library in your language of choice
https://pypi.python.org/pypi?%3Aaction=search&term=krux&submit=search
KRUX-STDLIB
$ pip install krux-stdlib
https://staticfiles.krxd.net/foss/docs/pypi/krux-stdlib/
BASIC APP USING STDLIB
$ sample-app -h	
[…]	
!
logging:	
--log-level {info,debug,critical,warning,error}	
Verbosity of logging. (default: warning)	
stats:	
--stats Enable sending statistics to statsd. (default: False)	
--stats-host STATS_HOST	
Statsd host to send statistics to. (default: localhost)	
--stats-port STATS_PORT	
Statsd port to send statistics to. (default: 8125)	
--stats-environment STATS_ENVIRONMENT	
Statsd environment. (default: dev)
https://staticfiles.krxd.net/foss/docs/pypi/krux-stdlib/
BASIC APP USING STDLIB
class App(krux.cli.Application):	
def __init__(self):	
### Call to the superclass to bootstrap.	
super(Application, self).__init__(	
	 	 	 	 	 	 	 	 	 	 name = 'sample-app')
	
def run(self):	
stats = self.stats	
log = self.logger	
!
with stats.timer('run'):	
log.info('running...')	
...
https://staticfiles.krxd.net/foss/docs/pypi/krux-stdlib/
https://pypi.python.org/pypi?%3Aaction=search&term=krux&submit=search
CLI
echo ‘events.deploy.appname:1|c’ | nc localhost -u 8125
JAVASCRIPT
Use a simple HTTP endpoint to send stats
PUPPET
Use the Puppet module graphite-report to send Puppet
reporting data directly to Graphite
http://docs.puppetlabs.com/guides/reporting.html
https://github.com/krux/puppet-module-graphite-report
Q & A
http://vickicaruana.blogspot.com/2011/01/are-you-afraid-to-raise-your-hand.html
@jiboumans	

http://slideshare.net/jiboumans

More Related Content

PDF
How to measure everything - a million metrics per second with minimal develop...
PDF
One-Man Ops
PDF
Locarise,reagent and JavaScript Libraries
PDF
clara-rules
PDF
Puppet Data Mining
PDF
Micrometerでメトリクスを収集してAmazon CloudWatchで可視化
PDF
人間では判定できない101すくみじゃんけんをコンピュータに判定させたい for Keras.js
PDF
決済サービスのSpring Bootのバージョンを2系に上げた話
How to measure everything - a million metrics per second with minimal develop...
One-Man Ops
Locarise,reagent and JavaScript Libraries
clara-rules
Puppet Data Mining
Micrometerでメトリクスを収集してAmazon CloudWatchで可視化
人間では判定できない101すくみじゃんけんをコンピュータに判定させたい for Keras.js
決済サービスのSpring Bootのバージョンを2系に上げた話

What's hot (20)

PDF
Star bed 2018.07.19
PDF
Tips on how to improve the performance of your custom modules for high volume...
PPTX
Sensu wrapper-sensu-summit
PDF
OSMC 2017 | Monitoring MySQL with Prometheus and Grafana by Julien Pivotto
PDF
Background processing with Resque
PDF
HBase based map reduce job unit testing
PDF
"Spark: from interactivity to production and back", Yurii Ostapchuk
PDF
Spark Jobserver
PPT
1st Chinaonrails Open Course 高级战略
PDF
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
PDF
倒计时优化点滴
PDF
Why Redux-Observable?
PDF
Virthualenvwrapper
PDF
I love Automation
PPTX
Transfer to kubernetes data platform from EMR
PDF
Cross Domain Web
Mashups with JQuery and Google App Engine
PDF
신뢰성 높은 클라우드 기반 서비스 운영을 위한 Chaos Engineering in Action (윤석찬, AWS 테크에반젤리스트) :: ...
PDF
TDC2016SP - Esqueça Grunt ou Gulp. Webpack and NPM rule them all!
PPTX
Heritrix REST API
PDF
Mad scalability: Scaling when you are not Google
Star bed 2018.07.19
Tips on how to improve the performance of your custom modules for high volume...
Sensu wrapper-sensu-summit
OSMC 2017 | Monitoring MySQL with Prometheus and Grafana by Julien Pivotto
Background processing with Resque
HBase based map reduce job unit testing
"Spark: from interactivity to production and back", Yurii Ostapchuk
Spark Jobserver
1st Chinaonrails Open Course 高级战略
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
倒计时优化点滴
Why Redux-Observable?
Virthualenvwrapper
I love Automation
Transfer to kubernetes data platform from EMR
Cross Domain Web
Mashups with JQuery and Google App Engine
신뢰성 높은 클라우드 기반 서비스 운영을 위한 Chaos Engineering in Action (윤석찬, AWS 테크에반젤리스트) :: ...
TDC2016SP - Esqueça Grunt ou Gulp. Webpack and NPM rule them all!
Heritrix REST API
Mad scalability: Scaling when you are not Google
Ad

Similar to How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo (20)

PDF
Rethinking metrics: metrics 2.0 @ Lisa 2014
KEY
Trending with Purpose
PDF
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
PPTX
HBaseCon 2013: Rebuilding for Scale on Apache HBase
PPTX
Rebuilding from MongoDB for Scale on HBase
PDF
OSDC 2014: Devdas Bhagat - Graphite: Graphs for the modern age
PPTX
StasD & Graphite - Measure anything, Measure Everything
PDF
Playing in Tune: How We Refactored Cube to Terabyte Scale
PDF
Harmony intune final
PDF
From Zero To Visibility
PPTX
MongoDB for Time Series Data: Setting the Stage for Sensor Management
PPTX
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
PPTX
Scaling Graphite At Yelp
PDF
Measure All the Things! - Austin Data Day 2014
PDF
Timeseries data in Riak - Riak Meetup Stockholm 1/11/2012
PDF
Metrics driven development 10.09.2014
PPTX
Deployment Preparedness
PDF
Tek12: Graphing real-time performance with Graphite
PPTX
System insight without Interference
PDF
MongoDB for Analytics
Rethinking metrics: metrics 2.0 @ Lisa 2014
Trending with Purpose
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
HBaseCon 2013: Rebuilding for Scale on Apache HBase
Rebuilding from MongoDB for Scale on HBase
OSDC 2014: Devdas Bhagat - Graphite: Graphs for the modern age
StasD & Graphite - Measure anything, Measure Everything
Playing in Tune: How We Refactored Cube to Terabyte Scale
Harmony intune final
From Zero To Visibility
MongoDB for Time Series Data: Setting the Stage for Sensor Management
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
Scaling Graphite At Yelp
Measure All the Things! - Austin Data Day 2014
Timeseries data in Riak - Riak Meetup Stockholm 1/11/2012
Metrics driven development 10.09.2014
Deployment Preparedness
Tek12: Graphing real-time performance with Graphite
System insight without Interference
MongoDB for Analytics
Ad

More from Puppet (20)

PPTX
Puppet Community Day: Planning the Future Together
PPTX
The Evolution of Puppet: Key Changes and Modernization Tips
PPTX
Can You Help Me Upgrade to Puppet 8? Tips, Tools & Best Practices for Your Up...
PPTX
Bolt Dynamic Inventory: Making Puppet Easier
PPTX
Customizing Reporting with the Puppet Report Processor
PPTX
Puppet at ConfigMgmtCamp 2025 Sponsor Deck
PPTX
The State of Puppet in 2025: A Presentation from Developer Relations Lead Dav...
PPTX
Let Red be Red and Green be Green: The Automated Workflow Restarter in GitHub...
PDF
Puppet camp2021 testing modules and controlrepo
PPTX
Puppetcamp r10kyaml
PDF
2021 04-15 operational verification (with notes)
PPTX
Puppet camp vscode
PDF
Modules of the twenties
PDF
Applying Roles and Profiles method to compliance code
PPTX
KGI compliance as-code approach
PDF
Enforce compliance policy with model-driven automation
PDF
Keynote: Puppet camp compliance
PPTX
Automating it management with Puppet + ServiceNow
PPTX
Puppet: The best way to harden Windows
PPTX
Simplified Patch Management with Puppet - Oct. 2020
Puppet Community Day: Planning the Future Together
The Evolution of Puppet: Key Changes and Modernization Tips
Can You Help Me Upgrade to Puppet 8? Tips, Tools & Best Practices for Your Up...
Bolt Dynamic Inventory: Making Puppet Easier
Customizing Reporting with the Puppet Report Processor
Puppet at ConfigMgmtCamp 2025 Sponsor Deck
The State of Puppet in 2025: A Presentation from Developer Relations Lead Dav...
Let Red be Red and Green be Green: The Automated Workflow Restarter in GitHub...
Puppet camp2021 testing modules and controlrepo
Puppetcamp r10kyaml
2021 04-15 operational verification (with notes)
Puppet camp vscode
Modules of the twenties
Applying Roles and Profiles method to compliance code
KGI compliance as-code approach
Enforce compliance policy with model-driven automation
Keynote: Puppet camp compliance
Automating it management with Puppet + ServiceNow
Puppet: The best way to harden Windows
Simplified Patch Management with Puppet - Oct. 2020

Recently uploaded (20)

PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
KodekX | Application Modernization Development
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Reach Out and Touch Someone: Haptics and Empathic Computing
MIND Revenue Release Quarter 2 2025 Press Release
Network Security Unit 5.pdf for BCA BBA.
MYSQL Presentation for SQL database connectivity
Unlocking AI with Model Context Protocol (MCP)
The Rise and Fall of 3GPP – Time for a Sabbatical?
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Dropbox Q2 2025 Financial Results & Investor Presentation
Building Integrated photovoltaic BIPV_UPV.pdf
Review of recent advances in non-invasive hemoglobin estimation
20250228 LYD VKU AI Blended-Learning.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Chapter 3 Spatial Domain Image Processing.pdf
The AUB Centre for AI in Media Proposal.docx
KodekX | Application Modernization Development
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Encapsulation_ Review paper, used for researhc scholars

How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo