SlideShare a Scribd company logo
Monitoring MySQL with OpenTSDB
Percona live 2013 Geoffrey Anderson, Box Inc.
@geodbz
Who
Geoffrey Anderson
• Database Operations Engineer @ Box, Inc.
• a.k.a. DBA
• Tooling for MySQL and HBase
• #DBHangOps
The
Situation
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDB
Then
You
Get
More
Servers
Monitoring MySQL with OpenTSDB
Enter OpenTSDB
OpenTSDB is...
• Distributed
• Scalable
• Time Series Database
• Runs on HBase
• Created By
Benoit Sigoure
HBase
TSD for
Querying
mydb.example.com
HAProxy
fe1.example.com
TSD for
Storing
Push
Metrics
Query via API
• FAST
• EASY to Scale
• EASY to Populate
• EASY to collect data
• EASY to Query
Why OpenTSDB?
Collecting
Data
#!/usr/bin/env bash
timestamp=$(date +%s)
mysql -ss -e "SHOW GLOBAL STATUS" | while read var val
do
echo "mysql.$var $timestamp $val host=$HOSTNAME"
done
ganderson@mydb.example.com:~$ _./mysql_collector.sh
mysql.Aborted_connects 1366399993 0 host=mydb.example.com
mysql.Binlog_cache_disk_use 1366399993 0 host=mydb.example.com
mysql.Binlog_cache_use 1366399993 0 host=mydb.example.com
mysql.Binlog_stmt_cache_disk_use 1366399993 0 host=mydb.example.com
mysql.Binlog_stmt_cache_use 1366399993 0 host=mydb.example.com
mysql.Bytes_received 1366399993 19453687 host=mydb.example.com
mysql.Bytes_sent 1366399993 1238166682 host=mydb.example.com
mysql.Com_admin_commands 1366399993 1 host=mydb.example.com
mysql.Com_assign_to_keycache 1366399993 0 host=mydb.example.com
...
Example: mysql_collector.sh
#!/usr/bin/env bash
timestamp=$(date +%s)
mysql -ss -e "SHOW GLOBAL STATUS" | while read var val
do
echo "mysql.$var $timestamp $val host=$HOSTNAME"
done
ganderson@mydb.example.com:~$ _./mysql_collector.sh
mysql.Aborted_connects 1366399993 0 host=mydb.example.com
mysql.Binlog_cache_disk_use 1366399993 0 host=mydb.example.com
mysql.Binlog_cache_use 1366399993 0 host=mydb.example.com
mysql.Binlog_stmt_cache_disk_use 1366399993 0 host=mydb.example.com
mysql.Binlog_stmt_cache_use 1366399993 0 host=mydb.example.com
mysql.Bytes_received 1366399993 19453687 host=mydb.example.com
mysql.Bytes_sent 1366399993 1238166682 host=mydb.example.com
mysql.Com_admin_commands 1366399993 1 host=mydb.example.com
mysql.Com_assign_to_keycache 1366399993 0 host=mydb.example.com
...
Example: mysql_collector.sh
Metric name Timestamp Value “Tags” (key=val)
* * * * * mysql_collector.sh | nc opentsdb.example.com 4242
Example: adding a cron for OpenTSDB
Monitoring MySQL with OpenTSDB
ganderson@mydb.example.com:tcollector$ tree
.
|-- collectors
| |-- 0
| | |-- ifstat.py
| | |-- iostat.py
| | |-- procnettcp.py
| | |-- procstats.py
| |-- 15
| | `-- dfstat.py
| |-- 30
| | |-- mysql_collector.sh
| |-- 300
| | `-- ptTcpModel.sh
| `-- etc
| |-- config.py
|-- config
|-- startstop
`-- tcollector.py
Run forever
Run every 15 seconds
Run every 5 minutes
Run every 30 seconds
Querying
Data
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDB
http://opentsdb.example.com
/#start=2013/04/10-07:32:29
&end=2013/04/10-07:57:57
&m=sum:proc.stat.cpu.percentage_idle{host=db22}
&o=axis x1y1
&m=sum:db.threads_running{host=db22}
&o=axis x1y2
&ylabel=CPU idle
&y2label=Threads Running
&yrange=[0:]
&wxh=1475x600
&png
http://opentsdb.example.com
/q?start=2013/04/10-07:32:29
&end=2013/04/10-07:57:57
&m=sum:proc.stat.cpu.percentage_idle{host=db22}
&o=axis x1y1
&m=sum:db.threads_running{host=db22}
&o=axis x1y2
&ylabel=CPU idle
&y2label=Threads Running
&yrange=[0:]
&ascii
Leveraging OpenTSDB For MySQL
user_statistics monitoring
table_statistics monitoring
Table Info from I_S
SELECT *, DATA_LENGTH+INDEX_LENGTH AS TOTAL_LENGTH
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA NOT IN
('PERFORMANCE_SCHEMA','INFORMATION_SCHEMA')
Query Throughput
And other “common” metrics
• Various MySQL status counters
• QPS (questions)
• Threads connected
• Temporary tables on disk
• Etc.
• Various server statistics
• %CPU Idle
• Free disk space
• I/O utilization
• Network traffic
• Etc.
Future collectors
• pt-query-digest/mysqlslow query statistics
• Data from “show engine innodb status”
• (that is missing from counters)
• PERFORMANCE_SCHEMA (MySQL 5.6+)
• Query statistics
• Processlist information
• Background thread information
How does this change things?
Monitoring MySQL with OpenTSDB
In all seriousness, though...
• Easily see aggregate graphs
• Easily build graphs on-the-fly
• Full granularity forever
• API request for raw data
• Cluster-wide nagios checks with check_tsd
Challenges Switching
• Aggregates are the default
• Mouse-zooming (patched!)
• Auto-suggest for metrics
• “The graphs aren’t pretty”
• Migrating from proof of concept
• Plan for 3+ machines
• Data pruning may be required
Some
Quick
Numbers OpenTSDB @ Box
 21,294 metrics
 72 tag keys
 5,145,745 tag values
 90% Interactive graphs
return <300ms
Next Steps
Enjoy #PerconaLive 2013
We’re hiring!
https://www.box.com/about-us/careers/
geoff@box.com
Image credits
 http://upload.wikimedia.org/wikipedia/commons/7/7b/Batelco_Network_Operations_Centre_(NOC).JPG
 http://www.flickr.com/photos/hoyvinmayvin/5873697252/
 http://www.percona.com/doc/percona-monitoring-plugins
 http://www.2cto.com/uploadfile/2012/0731/20120731112415744.jpg
 http://media.tumblr.com/tumblr_lvfspoenWU1qi19a2.png
 http://img.izismile.com/img/img4/20110527/640/you_can_be_a_superhero_640_01.jpg
 http://openclipart.org/image/250px/svg_to_png/26427/Anonymous_notebook.png
 http://images.alphacoders.com/768/2560-1600-76893.jpg
 http://www.flickr.com/photos/in365/4861180503/
 http://openclipart.org/image/250px/svg_to_png/130915/Prohibido_3D.png
 http://www.flickr.com/photos/61114149@N02/5566484951/
 http://opentsdb.net/img/tsd-sample.png
 http://images2.wikia.nocookie.net/__cb20080911160202/bttf/images/5/57/WhatdidItellyou-HQ.jpg
 http://www.flickr.com/photos/lisakayaks/3028350539/
 http://www.flickr.com/photos/25566302@N00/1472400115
 http://www.flickr.com/photos/grandmaitre/5846058698/
 http://www.flickr.com/photos/7518432@N06/2673347604/

More Related Content

PPTX
HBaseCon 2013: OpenTSDB at Box
PPTX
opentsdb in a real enviroment
PDF
OpenTSDB for monitoring @ Criteo
PDF
OpenTSDB 2.0
PDF
OpenTSDB: HBaseCon2017
PPTX
Update on OpenTSDB and AsyncHBase
PPTX
To Hire, or to train, that is the question (Percona Live 2014)
PPTX
Update on OpenTSDB and AsyncHBase
HBaseCon 2013: OpenTSDB at Box
opentsdb in a real enviroment
OpenTSDB for monitoring @ Criteo
OpenTSDB 2.0
OpenTSDB: HBaseCon2017
Update on OpenTSDB and AsyncHBase
To Hire, or to train, that is the question (Percona Live 2014)
Update on OpenTSDB and AsyncHBase

What's hot (20)

PDF
HBaseCon2017 gohbase: Pure Go HBase Client
PPTX
Keynote: Apache HBase at Yahoo! Scale
PDF
Gnocchi v3 brownbag
PDF
Gnocchi Profiling 2.1.x
PDF
Gnocchi v4 (preview)
PDF
Advanced Apache Cassandra Operations with JMX
PDF
ELK: Moose-ively scaling your log system
PDF
Monitoring with Prometheus
PDF
Gnocchi v3
PDF
Gnocchi Profiling v2
PDF
Gnocchi v4 - past and present
PPTX
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
PDF
Anatomy of an action
PDF
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
PPTX
Back to Basics Webinar 6: Production Deployment
PDF
openTSDB - Metrics for a distributed world
PPTX
Aerospike & GCE (LSPE Talk)
PDF
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
PDF
ScyllaDB: NoSQL at Ludicrous Speed
PDF
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
HBaseCon2017 gohbase: Pure Go HBase Client
Keynote: Apache HBase at Yahoo! Scale
Gnocchi v3 brownbag
Gnocchi Profiling 2.1.x
Gnocchi v4 (preview)
Advanced Apache Cassandra Operations with JMX
ELK: Moose-ively scaling your log system
Monitoring with Prometheus
Gnocchi v3
Gnocchi Profiling v2
Gnocchi v4 - past and present
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
Anatomy of an action
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Back to Basics Webinar 6: Production Deployment
openTSDB - Metrics for a distributed world
Aerospike & GCE (LSPE Talk)
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
ScyllaDB: NoSQL at Ludicrous Speed
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
Ad

Similar to Monitoring MySQL with OpenTSDB (20)

PPTX
HBaseCon 2015: OpenTSDB and AsyncHBase Update
PPTX
Need for Time series Database
PDF
Survey real time databases
PPTX
Percona Live UK 2014 Part III
PPTX
Eko10 workshop - OPEN SOURCE DATABASE MONITORING
PPTX
Eko10 Workshop Opensource Database Auditing
PDF
OSMC 2013 | openTSDB - metrics for a distributed world
PDF
Open TSDB Lightning Talk
PDF
FOSDEM 2015: gdb tips and tricks for MySQL DBAs
PDF
MariaDB - a MySQL Replacement #SELF2014
PDF
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
PDF
20190615 hkos-mysql-troubleshootingandperformancev2
PPTX
MySQL performance monitoring using Statsd and Graphite
PDF
Applying profilers to my sql (fosdem 2017)
PDF
Pi Day 2022 - from IoT to MySQL HeatWave Database Service
PPTX
Apache IOTDB: a Time Series Database for Industrial IoT
PDF
Chronix Poster for the Poster Session FAST 2017
PDF
Scaling Pinterest's Monitoring
PDF
[B14] A MySQL Replacement by Colin Charles
PDF
Ndb cluster 80_tpc_h
HBaseCon 2015: OpenTSDB and AsyncHBase Update
Need for Time series Database
Survey real time databases
Percona Live UK 2014 Part III
Eko10 workshop - OPEN SOURCE DATABASE MONITORING
Eko10 Workshop Opensource Database Auditing
OSMC 2013 | openTSDB - metrics for a distributed world
Open TSDB Lightning Talk
FOSDEM 2015: gdb tips and tricks for MySQL DBAs
MariaDB - a MySQL Replacement #SELF2014
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
20190615 hkos-mysql-troubleshootingandperformancev2
MySQL performance monitoring using Statsd and Graphite
Applying profilers to my sql (fosdem 2017)
Pi Day 2022 - from IoT to MySQL HeatWave Database Service
Apache IOTDB: a Time Series Database for Industrial IoT
Chronix Poster for the Poster Session FAST 2017
Scaling Pinterest's Monitoring
[B14] A MySQL Replacement by Colin Charles
Ndb cluster 80_tpc_h
Ad

Recently uploaded (20)

PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Big Data Technologies - Introduction.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Electronic commerce courselecture one. Pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
KodekX | Application Modernization Development
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Encapsulation_ Review paper, used for researhc scholars
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Approach and Philosophy of On baking technology
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Unlocking AI with Model Context Protocol (MCP)
MYSQL Presentation for SQL database connectivity
Agricultural_Statistics_at_a_Glance_2022_0.pdf
sap open course for s4hana steps from ECC to s4
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Big Data Technologies - Introduction.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Review of recent advances in non-invasive hemoglobin estimation
Electronic commerce courselecture one. Pdf
Chapter 3 Spatial Domain Image Processing.pdf
KodekX | Application Modernization Development
Understanding_Digital_Forensics_Presentation.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Encapsulation_ Review paper, used for researhc scholars
“AI and Expert System Decision Support & Business Intelligence Systems”
Approach and Philosophy of On baking technology
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Programs and apps: productivity, graphics, security and other tools

Monitoring MySQL with OpenTSDB

  • 1. Monitoring MySQL with OpenTSDB Percona live 2013 Geoffrey Anderson, Box Inc. @geodbz
  • 2. Who Geoffrey Anderson • Database Operations Engineer @ Box, Inc. • a.k.a. DBA • Tooling for MySQL and HBase • #DBHangOps
  • 10. OpenTSDB is... • Distributed • Scalable • Time Series Database • Runs on HBase • Created By Benoit Sigoure HBase TSD for Querying mydb.example.com HAProxy fe1.example.com TSD for Storing Push Metrics Query via API
  • 11. • FAST • EASY to Scale • EASY to Populate • EASY to collect data • EASY to Query Why OpenTSDB?
  • 13. #!/usr/bin/env bash timestamp=$(date +%s) mysql -ss -e "SHOW GLOBAL STATUS" | while read var val do echo "mysql.$var $timestamp $val host=$HOSTNAME" done ganderson@mydb.example.com:~$ _./mysql_collector.sh mysql.Aborted_connects 1366399993 0 host=mydb.example.com mysql.Binlog_cache_disk_use 1366399993 0 host=mydb.example.com mysql.Binlog_cache_use 1366399993 0 host=mydb.example.com mysql.Binlog_stmt_cache_disk_use 1366399993 0 host=mydb.example.com mysql.Binlog_stmt_cache_use 1366399993 0 host=mydb.example.com mysql.Bytes_received 1366399993 19453687 host=mydb.example.com mysql.Bytes_sent 1366399993 1238166682 host=mydb.example.com mysql.Com_admin_commands 1366399993 1 host=mydb.example.com mysql.Com_assign_to_keycache 1366399993 0 host=mydb.example.com ... Example: mysql_collector.sh
  • 14. #!/usr/bin/env bash timestamp=$(date +%s) mysql -ss -e "SHOW GLOBAL STATUS" | while read var val do echo "mysql.$var $timestamp $val host=$HOSTNAME" done ganderson@mydb.example.com:~$ _./mysql_collector.sh mysql.Aborted_connects 1366399993 0 host=mydb.example.com mysql.Binlog_cache_disk_use 1366399993 0 host=mydb.example.com mysql.Binlog_cache_use 1366399993 0 host=mydb.example.com mysql.Binlog_stmt_cache_disk_use 1366399993 0 host=mydb.example.com mysql.Binlog_stmt_cache_use 1366399993 0 host=mydb.example.com mysql.Bytes_received 1366399993 19453687 host=mydb.example.com mysql.Bytes_sent 1366399993 1238166682 host=mydb.example.com mysql.Com_admin_commands 1366399993 1 host=mydb.example.com mysql.Com_assign_to_keycache 1366399993 0 host=mydb.example.com ... Example: mysql_collector.sh Metric name Timestamp Value “Tags” (key=val)
  • 15. * * * * * mysql_collector.sh | nc opentsdb.example.com 4242 Example: adding a cron for OpenTSDB
  • 17. ganderson@mydb.example.com:tcollector$ tree . |-- collectors | |-- 0 | | |-- ifstat.py | | |-- iostat.py | | |-- procnettcp.py | | |-- procstats.py | |-- 15 | | `-- dfstat.py | |-- 30 | | |-- mysql_collector.sh | |-- 300 | | `-- ptTcpModel.sh | `-- etc | |-- config.py |-- config |-- startstop `-- tcollector.py Run forever Run every 15 seconds Run every 5 minutes Run every 30 seconds
  • 31. Table Info from I_S SELECT *, DATA_LENGTH+INDEX_LENGTH AS TOTAL_LENGTH FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA NOT IN ('PERFORMANCE_SCHEMA','INFORMATION_SCHEMA')
  • 33. And other “common” metrics • Various MySQL status counters • QPS (questions) • Threads connected • Temporary tables on disk • Etc. • Various server statistics • %CPU Idle • Free disk space • I/O utilization • Network traffic • Etc.
  • 34. Future collectors • pt-query-digest/mysqlslow query statistics • Data from “show engine innodb status” • (that is missing from counters) • PERFORMANCE_SCHEMA (MySQL 5.6+) • Query statistics • Processlist information • Background thread information
  • 35. How does this change things?
  • 37. In all seriousness, though... • Easily see aggregate graphs • Easily build graphs on-the-fly • Full granularity forever • API request for raw data • Cluster-wide nagios checks with check_tsd
  • 38. Challenges Switching • Aggregates are the default • Mouse-zooming (patched!) • Auto-suggest for metrics • “The graphs aren’t pretty” • Migrating from proof of concept • Plan for 3+ machines • Data pruning may be required
  • 39. Some Quick Numbers OpenTSDB @ Box  21,294 metrics  72 tag keys  5,145,745 tag values  90% Interactive graphs return <300ms
  • 41. Enjoy #PerconaLive 2013 We’re hiring! https://www.box.com/about-us/careers/ geoff@box.com
  • 42. Image credits  http://upload.wikimedia.org/wikipedia/commons/7/7b/Batelco_Network_Operations_Centre_(NOC).JPG  http://www.flickr.com/photos/hoyvinmayvin/5873697252/  http://www.percona.com/doc/percona-monitoring-plugins  http://www.2cto.com/uploadfile/2012/0731/20120731112415744.jpg  http://media.tumblr.com/tumblr_lvfspoenWU1qi19a2.png  http://img.izismile.com/img/img4/20110527/640/you_can_be_a_superhero_640_01.jpg  http://openclipart.org/image/250px/svg_to_png/26427/Anonymous_notebook.png  http://images.alphacoders.com/768/2560-1600-76893.jpg  http://www.flickr.com/photos/in365/4861180503/  http://openclipart.org/image/250px/svg_to_png/130915/Prohibido_3D.png  http://www.flickr.com/photos/61114149@N02/5566484951/  http://opentsdb.net/img/tsd-sample.png  http://images2.wikia.nocookie.net/__cb20080911160202/bttf/images/5/57/WhatdidItellyou-HQ.jpg  http://www.flickr.com/photos/lisakayaks/3028350539/  http://www.flickr.com/photos/25566302@N00/1472400115  http://www.flickr.com/photos/grandmaitre/5846058698/  http://www.flickr.com/photos/7518432@N06/2673347604/

Editor's Notes

  • #2: Will be talking about OpenTSDBHow OpenTSDB changed monitoring at boxHow we leverage it’s abilities for day-to-day management of MySQL DBs
  • #5: Youprobablyhave the perconacactigraphs and monitoring plugins
  • #6: Youaddsomeothernagioschecks for funedgecases
  • #7: And you use different tools from the percona toolkit like:StalkPoor man’s profiler (PMP)Query Digest
  • #8: Suddenly finding problems and correlating issues is difficultMaybe you don’t have a NOC yetMaybe you do, and they need better graphs
  • #11: IT’S BIGGER ON THE INSIDE – just kiddingFast!Easy to build graphs on the flyHella easy to scale – just add nodes (HBase or TSDs)Very easy to put data into it – NEXT SLIDES TALK ABOUT THIS YO
  • #18: Running threads follows the CPU spikes PERFECTLYBox has a “long query” killer that gets more aggressive as more threads stack upShould get a look at queries on the server
  • #19: Zoom in to get the exact time interval
  • #20: Know the exact time of a high stack upGo to check Box Anemometer to see what query is there
  • #21: This is the URL for thatCan easily paste this to anyone to see the same interactive graph
  • #22: If you prefer text, that’s also an option via APIYou can build cool tools using the APIWeek over Week graphsSimplifies anomaly detectionURL is pretty simpleEffectively just use “q?” and add “&amp;ascii”
  • #24: Get audit log:LoginsTypes of statements issuedEtc.
  • #25: Get performance information about:Row and index change activityRow read activity
  • #26: Generate daily reports of:Are auto increments columns nearing a boundary on a table?Number of records in a tableSize of a datafile for a table
  • #27: Using pt-tcp-modelAllows us to identify when server stops doing work5min interval
  • #31: Aggregate graphs are the defaultDrill down only when problems in aggregate
  • #32: Aggregatesare thedefault–shift in thinking from lookingatspecificimportantservers.Zooming in on a timeslice was painfullymanual– I wroteup a patch to addmouse-zooming and upstreamed. Thiscementedopentsdb as a powerful monitoring tool for Box, overnightAuto-suggest for metricsisspotty– we wrote a quick cron job that dumps full metric list into JSON “Graphs aren’t pretty” – a few changes to the base GNUPlot options solved this. There’s also a “Smooth” option in the interface nowMigrating from POC – we had a single-node setup for the longest time until that fell over...a lotPlan for 3+ machines – it’s enough to run all the needed bits for a light-weight distributed HBase and TSD setupData pruning – ~4 bytes per metric before HDFS replication add up quicklymysql_tcollector - 370 metrics -- ~1.5k per server. X 30s interval = ~4.2MB/dayeither have a plan to prune old data or build out extra capacity and predict storage needs per server/metric added