SlideShare a Scribd company logo
The Ganglia Monitoring Framework Vladimir Vuksan,  June 2011 http://deanspot.org/content/ganglia-references
About this talk Ganglia architecture How to get metrics in. How to get metrics out.
Getting Ganglia tar xzvf ganglia-3.1.7.tar.gz ./configure --with-gmetad make make install Or use binary packages ie. Ubuntu/Debian : apt-get install ganglia-monitor gmetad Fedora : yum install ganglia-gmond ganglia-gmetad
Ganglia Architecture 2 daemons: gmond & gmetad gmond collects or receives metric data on each node 1 gmetad per grid. polls 1 gmond per cluster for data. a node belongs to a cluster. a cluster belongs to a grid. Web UI a separate item use it or lose it
 
 
 
 
 
Demo
Custom Graphs { {  "report_name" : "network_report", "report_type" : "standard", "title" : "Network", "vertical_label" : "Bytes/sec", "series" : [ { "metric": "bytes_in", "color": "33cc33", "label": "In", "line_width": "2", "type": "line" }, { "metric": "bytes_out", "color": "5555cc", "label": "Out", "line_width": "2", "type": "line" } ] }
Replaces this: DEF:'a0'='/var/lib/ganglia/g1/rrds/c-1-2/__SummaryInfo__/bytes_in.rrd':'sum':AVERAGE  LINE2:'a0'#33cc33:'In '  VDEF:a0_last=a0,LAST  VDEF:a0_min=a0,MINIMUM  VDEF:a0_avg=a0,AVERAGE  VDEF:a0_max=a0,MAXIMUM  GPRINT:'a0_last':'Now\\:%5.1lf%s'  GPRINT:'a0_min':'Min\\:%5.1lf%s'  GPRINT:'a0_avg':'Avg\\:%5.1lf%s'  GPRINT:'a0_max':'Max\\:%5.1lf%s\\l'  DEF:'a1'='/var/lib/ganglia/g1/rrds/c-1-2/__SummaryInfo__/bytes_out.rrd':'sum':AVERAGE  LINE2:'a1'#5555cc:'Out'  VDEF:a1_last=a1,LAST  VDEF:a1_min=a1,MINIMUM  VDEF:a1_avg=a1,AVERAGE  VDEF:a1_max=a1,MAXIMUM  GPRINT:'a1_last':'Now\\:%5.1lf%s'  GPRINT:'a1_min':'Min\\:%5.1lf%s'  GPRINT:'a1_avg':'Avg\\:%5.1lf%s'  GPRINT:'a1_max':'Max\\:%5.1lf%s\\l'
A quick word about RRD.
gmetad creates 1 RRD file for each metric. default retention schedule is defined in gmetad.conf Store an avg every For This Long 15 sec 60 min 6 min 1 day 42 min 1 week 168 min 30 days 1 day 1 year
Default schedule fits into 12K per metric -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 bytes_in.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 bytes_out.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_aidle.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_idle.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_nice.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_num.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_speed.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_system.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_user.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_wio.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 disk_free.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 disk_total.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 load_fifteen.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 load_five.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 load_one.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 mem_buffers.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 proc_run.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 proc_total.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 swap_free.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 swap_total.rrd
If you need more resolution adjust it in gmetad conf Getting started with RRD: http://oss.oetiker.ch/rrdtool/tut/rrd-beginners.en.html More help defining RRD files: http://www.cuddletech.com/articles/rrd/ar01s02.html A little more about how RRD works with Ganglia: http://vuksan.com/blog/2010/12/14/misconceptions-about-rrd-storage/
Getting data in Via gmond modules, written in C or Python. Via gmetric or libraries that implement the gmetric protocol. Via other daemons designed to feed metrics to ganglia (e.g. sFlow)
Zero configuration. Just start sending new metrics. gmetad will create a new RRD file for any new metric it sees. The web UI will draw a basic graph for every metric. You can create nice colored graphs later if you want them.
gmond module modules { module {  name = "net_module" path = "modnet.so"  } } collection_group {  collect_every = 40  time_threshold = 300  metric {  name = "bytes_out"  value_threshold = 4096 title = "Bytes Sent"  }  metric {  name = "bytes_in" value_threshold = 4096 title = "Bytes Received"  }  }
gmetric $ gmetric -c /etc/ganglia/gmond.conf –name=foo \  --value=512 --units=foos --type=uint8  --dmax=60 CLI Ruby
What kind of metrics can I collect? Load time of your home page? Number of active trouble tickets? LOC in your application? rcov coverage statistics? Execution time of your test suite? Number of user logins? Memory usage by a particular process? Many other metric plugins available at  http://github.com/ganglia
 
Log parsing apps Eat log files and make metrics http://vuksan.com/linux/ganglia/# Apache_Traffic_Stats ganglia-logtailer   https://bitbucket.org/maplebed/ganglia-logtailer logster   https://github.com/etsy/logster
But I thought Ganglia is only useful for host metrics? No You can create “non-existent” hosts by spoofing
Custom Metric Demo
Integrating Ganglia & Nagios $ check_ganglia_metric.py --gmetad_host=gmetad-server.example.com \  --metric_host=host.example.com --metric_name=cpu_idle --critical=99 Status Critical, CPU Idle = 99.6 %|cpu_idle=99.6%;;99;; https://github.com/mconigliaro/check_ganglia_metric http://vuksan.com/blog/2011/04/19/use-your-trending-data-for-alerting/
Getting Data Out Web UI JSON & CSV export gmond XML gmetad XML gmetad interactive
gmond $ telnet localhost 8649 <GANGLIA_XML VERSION=&quot;3.1.2&quot; SOURCE=&quot;gmond&quot;> <CLUSTER NAME=&quot;Debug&quot; LOCALTIME=&quot;1258504343&quot; OWNER=&quot;Alex&quot;> <HOST NAME=&quot;10.0.3.128&quot; IP=&quot;10.0.3.128&quot; REPORTED=&quot;1258504325&quot;  TN=&quot;18&quot; TMAX=&quot;20&quot; DMAX=&quot;0&quot;> <METRIC NAME=&quot;load_five&quot; VAL=&quot;0.15&quot; TYPE=&quot;float&quot; UNITS=&quot; &quot;  TN=&quot;30&quot; TMAX=&quot;325&quot; DMAX=&quot;0&quot; SLOPE=&quot;both&quot;> <EXTRA_DATA> <EXTRA_ELEMENT NAME=&quot;GROUP&quot; VAL=&quot;load&quot;/> <EXTRA_ELEMENT NAME=&quot;DESC&quot; VAL=&quot;Five minute load average&quot;/> <EXTRA_ELEMENT NAME=&quot;TITLE&quot; VAL=&quot;Five Minute Load Average&quot;/> </EXTRA_DATA> </METRIC> </HOST> </CLUSTER> </GANGLIA>
gmond $ telnet localhost 8649 < GANGLIA_XML  VERSION=&quot;3.1.2&quot; SOURCE=&quot;gmond&quot;> < CLUSTER  NAME=&quot;Debug&quot; LOCALTIME=&quot;1258504343&quot; OWNER=&quot;Alex&quot;> < HOST  NAME=&quot;10.0.3.128&quot; IP=&quot;10.0.3.128&quot; REPORTED=&quot;1258504325&quot;  TN=&quot;18&quot; TMAX=&quot;20&quot; DMAX=&quot;0&quot;> <METRIC NAME=&quot;load_five&quot; VAL=&quot;0.15&quot; TYPE=&quot;float&quot; UNITS=&quot; &quot;  TN=&quot;30&quot; TMAX=&quot;325&quot; DMAX=&quot;0&quot; SLOPE=&quot;both&quot;> <EXTRA_DATA> <EXTRA_ELEMENT NAME=&quot;GROUP&quot; VAL=&quot;load&quot;/> <EXTRA_ELEMENT NAME=&quot;DESC&quot; VAL=&quot;Five minute load average&quot;/> <EXTRA_ELEMENT NAME=&quot;TITLE&quot; VAL=&quot;Five Minute Load Average&quot;/> </EXTRA_DATA> </METRIC> </HOST> </CLUSTER> </GANGLIA>
gmetad (non-interactive) $ telnet localhost 8651 <GANGLIA_XML VERSION=&quot;3.1.2&quot; SOURCE=&quot;gmetad&quot;> <GRID NAME=&quot;unspecified&quot; AUTHORITY=&quot;http://localhost/~alex/ganglia/&quot;  LOCALTIME=&quot;1258504287&quot;> <CLUSTER NAME=&quot;Debug&quot; LOCALTIME=&quot;1258504282&quot; OWNER=&quot;Alex&quot;  LATLONG=&quot;unspecified&quot; URL=&quot;unspecified&quot;> <HOST NAME=&quot;10.0.3.128&quot; IP=&quot;10.0.3.128&quot; REPORTED=&quot;1258504265&quot;  TN=&quot;21&quot; TMAX=&quot;20&quot; DMAX=&quot;0&quot; LOCATION=&quot;unspecified&quot;  GMOND_STARTED=&quot;1258477064&quot;> <METRIC NAME=&quot;load_five&quot; VAL=&quot;0.15&quot; TYPE=&quot;float&quot; UNITS=&quot; &quot;  TN=&quot;30&quot; TMAX=&quot;325&quot; DMAX=&quot;0&quot; SLOPE=&quot;both&quot;> <EXTRA_DATA> <EXTRA_ELEMENT NAME=&quot;GROUP&quot; VAL=&quot;load&quot;/> <EXTRA_ELEMENT NAME=&quot;DESC&quot; VAL=&quot;Five minute load average&quot;/> <EXTRA_ELEMENT NAME=&quot;TITLE&quot; VAL=&quot;Five Minute Load Average&quot;/> </EXTRA_DATA> </METRIC> </METRIC> </HOST> </CLUSTER> </GANGLIA_XML>
gmetad (2) $ telnet localhost 8651 < GANGLIA_XML  VERSION=&quot;3.1.2&quot; SOURCE=&quot;gmetad&quot;> < GRID  NAME=&quot;unspecified&quot; AUTHORITY=&quot;http://localhost/~alex/ganglia/&quot;  LOCALTIME=&quot;1258504287&quot;> < CLUSTER  NAME=&quot;Debug&quot; LOCALTIME=&quot;1258504282&quot; OWNER=&quot;Alex&quot;  LATLONG=&quot;unspecified&quot; URL=&quot;unspecified&quot;> < HOST  NAME=&quot;10.0.3.128&quot; IP=&quot;10.0.3.128&quot; REPORTED=&quot;1258504265&quot;  TN=&quot;21&quot; TMAX=&quot;20&quot; DMAX=&quot;0&quot; LOCATION=&quot;unspecified&quot;  GMOND_STARTED=&quot;1258477064&quot;> <METRIC NAME=&quot;load_five&quot; VAL=&quot;0.15&quot; TYPE=&quot;float&quot; UNITS=&quot; &quot;  TN=&quot;30&quot; TMAX=&quot;325&quot; DMAX=&quot;0&quot; SLOPE=&quot;both&quot;> <EXTRA_DATA> <EXTRA_ELEMENT NAME=&quot;GROUP&quot; VAL=&quot;load&quot;/> <EXTRA_ELEMENT NAME=&quot;DESC&quot; VAL=&quot;Five minute load average&quot;/> <EXTRA_ELEMENT NAME=&quot;TITLE&quot; VAL=&quot;Five Minute Load Average&quot;/> </EXTRA_DATA> </METRIC> </METRIC> </HOST> </CLUSTER> </GANGLIA_XML>
gmetad (interactive) $ telnet localhost 8652 Connected to localhost.Escape character is '^]' ./cluster_name/host_name/load_five/ ... receive same XML format as normal gmetad port, but limited only to the metric you request ... ... receive same XML format as normal gmetad port, but limited only to the metric you request ... ... receive same XML format as normal gmetad port, but limited only to the metric you request ...
Is it scalable?
Issues It uses reverse DNS lookups to determine hostname => may cause issues in a cloud (need to use workarounds) Doesn't allow arbitrary metric hierarchy (at this time :-))
What's next Support for arbitrary grouping beyond clusters using e.g. tags Better Nagios integration New visualization e.g. heatmaps Logstash integration
Questions Twitter: @vvuksan

More Related Content

PPTX
UDPSRC GStreamer Plugin Session VIII
PDF
Troubleshooting PostgreSQL with pgCenter
PDF
An introduction to using GStreamer in your GNOME application
PPT
Gstreamer plugin devpt_1
PDF
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев, Михаил Тюр...
PDF
Synchronised Multidevice Media Playback with Gstreamer
PDF
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...
PDF
Gstreamer Basics
UDPSRC GStreamer Plugin Session VIII
Troubleshooting PostgreSQL with pgCenter
An introduction to using GStreamer in your GNOME application
Gstreamer plugin devpt_1
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев, Михаил Тюр...
Synchronised Multidevice Media Playback with Gstreamer
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...
Gstreamer Basics

What's hot (20)

PDF
Deep dive into PostgreSQL statistics.
PDF
MySQL Galera 集群
PPTX
How to Troubleshoot OpenStack Without Losing Sleep
PDF
Percona XtraDB 集群安装与配置
PPTX
Troubleshooting containerized triple o deployment
PPTX
SCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
PPTX
Become a Garbage Collection Hero
PDF
Nyc open data project ii -- predict where to get and return my citibike
PPTX
Become a GC Hero
PDF
LSFMM 2019 BPF Observability
PDF
RSA NetWitness Log Decoder
PDF
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PPTX
Don't dump thread dumps
PDF
neutron测试例子
PPT
OSTU - Sake Blok on Packet Capturing with Tshark
PDF
Openstack on Fedora, Fedora on Openstack: An Introduction to cloud IaaS
PDF
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PPTX
SSL Failing, Sharing, and Scheduling
PDF
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
PDF
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Deep dive into PostgreSQL statistics.
MySQL Galera 集群
How to Troubleshoot OpenStack Without Losing Sleep
Percona XtraDB 集群安装与配置
Troubleshooting containerized triple o deployment
SCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
Become a Garbage Collection Hero
Nyc open data project ii -- predict where to get and return my citibike
Become a GC Hero
LSFMM 2019 BPF Observability
RSA NetWitness Log Decoder
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
Don't dump thread dumps
neutron测试例子
OSTU - Sake Blok on Packet Capturing with Tshark
Openstack on Fedora, Fedora on Openstack: An Introduction to cloud IaaS
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
SSL Failing, Sharing, and Scheduling
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Ad

Viewers also liked (18)

PPSX
Meio ambiente
PDF
Flat Plan
TXT
Install
PDF
PDF
Working with angel groups 2016
PPTX
Ser reconhecido renascer prise
PPTX
Diigo
DOCX
Senarai nama tahun 4
PDF
00 - LOGO - Endelig
PPT
UAs met ICT - samen sterker
PDF
20130129161440139
PDF
PDF
AHIP Certificate
PPSX
E learning-tech
PDF
Ciurcina centronexa 29_06_2015
PDF
Gravedad especifica
PPT
Organisational chart
PPT
TREBALL VOLUNTARI 1a avaluació
Meio ambiente
Flat Plan
Install
Working with angel groups 2016
Ser reconhecido renascer prise
Diigo
Senarai nama tahun 4
00 - LOGO - Endelig
UAs met ICT - samen sterker
20130129161440139
AHIP Certificate
E learning-tech
Ciurcina centronexa 29_06_2015
Gravedad especifica
Organisational chart
TREBALL VOLUNTARI 1a avaluació
Ad

Similar to Ganglia Overview-v2 (20)

PPT
Ganglia Monitoring Tool
PDF
Monitoring with Ganglia
PDF
Metrics with Ganglia
ODP
JUDCon London 2011 - Bin packing with drools planner by example
ODP
2012 02-04 fosdem 2012 - drools planner
PPTX
Practical Operation Automation with StackStorm
PDF
Osol Pgsql
PDF
Contract testing symfony camp 2018
PPTX
و کشف بد افزار OSSEC
PDF
20131015_demo_oshk
PPTX
Contract testing - isolated testing of microservices - Symfony Camp 2018, Evg...
PDF
From SQLAlchemy to Ming with TurboGears2
ODP
Metrics that talk on cloud using ganglia
PDF
Part 4 Maximizing the utilization of GPU resources on-premise and in the cloud
PDF
GDG Cloud Iasi - Docker For The Busy Developer.pdf
PDF
The Automation Factory
PDF
Advanced Kml
PPTX
Troubleshooting real production problems
PDF
Instrumentación de entrega continua con Gitlab
PDF
Integrating ChatGPT with Apache Airflow
Ganglia Monitoring Tool
Monitoring with Ganglia
Metrics with Ganglia
JUDCon London 2011 - Bin packing with drools planner by example
2012 02-04 fosdem 2012 - drools planner
Practical Operation Automation with StackStorm
Osol Pgsql
Contract testing symfony camp 2018
و کشف بد افزار OSSEC
20131015_demo_oshk
Contract testing - isolated testing of microservices - Symfony Camp 2018, Evg...
From SQLAlchemy to Ming with TurboGears2
Metrics that talk on cloud using ganglia
Part 4 Maximizing the utilization of GPU resources on-premise and in the cloud
GDG Cloud Iasi - Docker For The Busy Developer.pdf
The Automation Factory
Advanced Kml
Troubleshooting real production problems
Instrumentación de entrega continua con Gitlab
Integrating ChatGPT with Apache Airflow

More from Chris Westin (20)

PDF
Data torrent meetup-productioneng
PDF
Gripshort
PPTX
Ambari hadoop-ops-meetup-2013-09-19.final
PDF
Cluster management and automation with cloudera manager
PDF
Building low latency java applications with ehcache
PDF
SDN/OpenFlow #lspe
ODP
cfengine3 at #lspe
PPTX
mongodb-aggregation-may-2012
PDF
Nimbula lspe-2012-04-19
PPTX
mongodb-brief-intro-february-2012
PDF
Stingray - Riverbed Technology
PPTX
MongoDB's New Aggregation framework
PPTX
Replication and replica sets
PPTX
Architecting a Scale Out Cloud Storage Solution
PPTX
FlashCache
PPTX
Large Scale Cacti
PPTX
MongoDB: An Introduction - July 2011
PPTX
Practical Replication June-2011
PPTX
MongoDB: An Introduction - june-2011
PPTX
MongoDB Aggregation MongoSF May 2011
Data torrent meetup-productioneng
Gripshort
Ambari hadoop-ops-meetup-2013-09-19.final
Cluster management and automation with cloudera manager
Building low latency java applications with ehcache
SDN/OpenFlow #lspe
cfengine3 at #lspe
mongodb-aggregation-may-2012
Nimbula lspe-2012-04-19
mongodb-brief-intro-february-2012
Stingray - Riverbed Technology
MongoDB's New Aggregation framework
Replication and replica sets
Architecting a Scale Out Cloud Storage Solution
FlashCache
Large Scale Cacti
MongoDB: An Introduction - July 2011
Practical Replication June-2011
MongoDB: An Introduction - june-2011
MongoDB Aggregation MongoSF May 2011

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPT
Teaching material agriculture food technology
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Spectral efficient network and resource selection model in 5G networks
Chapter 3 Spatial Domain Image Processing.pdf
Machine learning based COVID-19 study performance prediction
Mobile App Security Testing_ A Comprehensive Guide.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
“AI and Expert System Decision Support & Business Intelligence Systems”
CIFDAQ's Market Insight: SEC Turns Pro Crypto
MYSQL Presentation for SQL database connectivity
20250228 LYD VKU AI Blended-Learning.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Teaching material agriculture food technology
NewMind AI Monthly Chronicles - July 2025
Empathic Computing: Creating Shared Understanding
Understanding_Digital_Forensics_Presentation.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf

Ganglia Overview-v2

  • 1. The Ganglia Monitoring Framework Vladimir Vuksan, June 2011 http://deanspot.org/content/ganglia-references
  • 2. About this talk Ganglia architecture How to get metrics in. How to get metrics out.
  • 3. Getting Ganglia tar xzvf ganglia-3.1.7.tar.gz ./configure --with-gmetad make make install Or use binary packages ie. Ubuntu/Debian : apt-get install ganglia-monitor gmetad Fedora : yum install ganglia-gmond ganglia-gmetad
  • 4. Ganglia Architecture 2 daemons: gmond & gmetad gmond collects or receives metric data on each node 1 gmetad per grid. polls 1 gmond per cluster for data. a node belongs to a cluster. a cluster belongs to a grid. Web UI a separate item use it or lose it
  • 5.  
  • 6.  
  • 7.  
  • 8.  
  • 9.  
  • 10. Demo
  • 11. Custom Graphs { { &quot;report_name&quot; : &quot;network_report&quot;, &quot;report_type&quot; : &quot;standard&quot;, &quot;title&quot; : &quot;Network&quot;, &quot;vertical_label&quot; : &quot;Bytes/sec&quot;, &quot;series&quot; : [ { &quot;metric&quot;: &quot;bytes_in&quot;, &quot;color&quot;: &quot;33cc33&quot;, &quot;label&quot;: &quot;In&quot;, &quot;line_width&quot;: &quot;2&quot;, &quot;type&quot;: &quot;line&quot; }, { &quot;metric&quot;: &quot;bytes_out&quot;, &quot;color&quot;: &quot;5555cc&quot;, &quot;label&quot;: &quot;Out&quot;, &quot;line_width&quot;: &quot;2&quot;, &quot;type&quot;: &quot;line&quot; } ] }
  • 12. Replaces this: DEF:'a0'='/var/lib/ganglia/g1/rrds/c-1-2/__SummaryInfo__/bytes_in.rrd':'sum':AVERAGE LINE2:'a0'#33cc33:'In ' VDEF:a0_last=a0,LAST VDEF:a0_min=a0,MINIMUM VDEF:a0_avg=a0,AVERAGE VDEF:a0_max=a0,MAXIMUM GPRINT:'a0_last':'Now\\:%5.1lf%s' GPRINT:'a0_min':'Min\\:%5.1lf%s' GPRINT:'a0_avg':'Avg\\:%5.1lf%s' GPRINT:'a0_max':'Max\\:%5.1lf%s\\l' DEF:'a1'='/var/lib/ganglia/g1/rrds/c-1-2/__SummaryInfo__/bytes_out.rrd':'sum':AVERAGE LINE2:'a1'#5555cc:'Out' VDEF:a1_last=a1,LAST VDEF:a1_min=a1,MINIMUM VDEF:a1_avg=a1,AVERAGE VDEF:a1_max=a1,MAXIMUM GPRINT:'a1_last':'Now\\:%5.1lf%s' GPRINT:'a1_min':'Min\\:%5.1lf%s' GPRINT:'a1_avg':'Avg\\:%5.1lf%s' GPRINT:'a1_max':'Max\\:%5.1lf%s\\l'
  • 13. A quick word about RRD.
  • 14. gmetad creates 1 RRD file for each metric. default retention schedule is defined in gmetad.conf Store an avg every For This Long 15 sec 60 min 6 min 1 day 42 min 1 week 168 min 30 days 1 day 1 year
  • 15. Default schedule fits into 12K per metric -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 bytes_in.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 bytes_out.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_aidle.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_idle.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_nice.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_num.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_speed.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_system.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_user.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_wio.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 disk_free.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 disk_total.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 load_fifteen.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 load_five.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 load_one.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 mem_buffers.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 proc_run.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 proc_total.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 swap_free.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 swap_total.rrd
  • 16. If you need more resolution adjust it in gmetad conf Getting started with RRD: http://oss.oetiker.ch/rrdtool/tut/rrd-beginners.en.html More help defining RRD files: http://www.cuddletech.com/articles/rrd/ar01s02.html A little more about how RRD works with Ganglia: http://vuksan.com/blog/2010/12/14/misconceptions-about-rrd-storage/
  • 17. Getting data in Via gmond modules, written in C or Python. Via gmetric or libraries that implement the gmetric protocol. Via other daemons designed to feed metrics to ganglia (e.g. sFlow)
  • 18. Zero configuration. Just start sending new metrics. gmetad will create a new RRD file for any new metric it sees. The web UI will draw a basic graph for every metric. You can create nice colored graphs later if you want them.
  • 19. gmond module modules { module { name = &quot;net_module&quot; path = &quot;modnet.so&quot; } } collection_group { collect_every = 40 time_threshold = 300 metric { name = &quot;bytes_out&quot; value_threshold = 4096 title = &quot;Bytes Sent&quot; } metric { name = &quot;bytes_in&quot; value_threshold = 4096 title = &quot;Bytes Received&quot; } }
  • 20. gmetric $ gmetric -c /etc/ganglia/gmond.conf –name=foo \ --value=512 --units=foos --type=uint8 --dmax=60 CLI Ruby
  • 21. What kind of metrics can I collect? Load time of your home page? Number of active trouble tickets? LOC in your application? rcov coverage statistics? Execution time of your test suite? Number of user logins? Memory usage by a particular process? Many other metric plugins available at http://github.com/ganglia
  • 22.  
  • 23. Log parsing apps Eat log files and make metrics http://vuksan.com/linux/ganglia/# Apache_Traffic_Stats ganglia-logtailer https://bitbucket.org/maplebed/ganglia-logtailer logster https://github.com/etsy/logster
  • 24. But I thought Ganglia is only useful for host metrics? No You can create “non-existent” hosts by spoofing
  • 26. Integrating Ganglia & Nagios $ check_ganglia_metric.py --gmetad_host=gmetad-server.example.com \ --metric_host=host.example.com --metric_name=cpu_idle --critical=99 Status Critical, CPU Idle = 99.6 %|cpu_idle=99.6%;;99;; https://github.com/mconigliaro/check_ganglia_metric http://vuksan.com/blog/2011/04/19/use-your-trending-data-for-alerting/
  • 27. Getting Data Out Web UI JSON & CSV export gmond XML gmetad XML gmetad interactive
  • 28. gmond $ telnet localhost 8649 <GANGLIA_XML VERSION=&quot;3.1.2&quot; SOURCE=&quot;gmond&quot;> <CLUSTER NAME=&quot;Debug&quot; LOCALTIME=&quot;1258504343&quot; OWNER=&quot;Alex&quot;> <HOST NAME=&quot;10.0.3.128&quot; IP=&quot;10.0.3.128&quot; REPORTED=&quot;1258504325&quot; TN=&quot;18&quot; TMAX=&quot;20&quot; DMAX=&quot;0&quot;> <METRIC NAME=&quot;load_five&quot; VAL=&quot;0.15&quot; TYPE=&quot;float&quot; UNITS=&quot; &quot; TN=&quot;30&quot; TMAX=&quot;325&quot; DMAX=&quot;0&quot; SLOPE=&quot;both&quot;> <EXTRA_DATA> <EXTRA_ELEMENT NAME=&quot;GROUP&quot; VAL=&quot;load&quot;/> <EXTRA_ELEMENT NAME=&quot;DESC&quot; VAL=&quot;Five minute load average&quot;/> <EXTRA_ELEMENT NAME=&quot;TITLE&quot; VAL=&quot;Five Minute Load Average&quot;/> </EXTRA_DATA> </METRIC> </HOST> </CLUSTER> </GANGLIA>
  • 29. gmond $ telnet localhost 8649 < GANGLIA_XML VERSION=&quot;3.1.2&quot; SOURCE=&quot;gmond&quot;> < CLUSTER NAME=&quot;Debug&quot; LOCALTIME=&quot;1258504343&quot; OWNER=&quot;Alex&quot;> < HOST NAME=&quot;10.0.3.128&quot; IP=&quot;10.0.3.128&quot; REPORTED=&quot;1258504325&quot; TN=&quot;18&quot; TMAX=&quot;20&quot; DMAX=&quot;0&quot;> <METRIC NAME=&quot;load_five&quot; VAL=&quot;0.15&quot; TYPE=&quot;float&quot; UNITS=&quot; &quot; TN=&quot;30&quot; TMAX=&quot;325&quot; DMAX=&quot;0&quot; SLOPE=&quot;both&quot;> <EXTRA_DATA> <EXTRA_ELEMENT NAME=&quot;GROUP&quot; VAL=&quot;load&quot;/> <EXTRA_ELEMENT NAME=&quot;DESC&quot; VAL=&quot;Five minute load average&quot;/> <EXTRA_ELEMENT NAME=&quot;TITLE&quot; VAL=&quot;Five Minute Load Average&quot;/> </EXTRA_DATA> </METRIC> </HOST> </CLUSTER> </GANGLIA>
  • 30. gmetad (non-interactive) $ telnet localhost 8651 <GANGLIA_XML VERSION=&quot;3.1.2&quot; SOURCE=&quot;gmetad&quot;> <GRID NAME=&quot;unspecified&quot; AUTHORITY=&quot;http://localhost/~alex/ganglia/&quot; LOCALTIME=&quot;1258504287&quot;> <CLUSTER NAME=&quot;Debug&quot; LOCALTIME=&quot;1258504282&quot; OWNER=&quot;Alex&quot; LATLONG=&quot;unspecified&quot; URL=&quot;unspecified&quot;> <HOST NAME=&quot;10.0.3.128&quot; IP=&quot;10.0.3.128&quot; REPORTED=&quot;1258504265&quot; TN=&quot;21&quot; TMAX=&quot;20&quot; DMAX=&quot;0&quot; LOCATION=&quot;unspecified&quot; GMOND_STARTED=&quot;1258477064&quot;> <METRIC NAME=&quot;load_five&quot; VAL=&quot;0.15&quot; TYPE=&quot;float&quot; UNITS=&quot; &quot; TN=&quot;30&quot; TMAX=&quot;325&quot; DMAX=&quot;0&quot; SLOPE=&quot;both&quot;> <EXTRA_DATA> <EXTRA_ELEMENT NAME=&quot;GROUP&quot; VAL=&quot;load&quot;/> <EXTRA_ELEMENT NAME=&quot;DESC&quot; VAL=&quot;Five minute load average&quot;/> <EXTRA_ELEMENT NAME=&quot;TITLE&quot; VAL=&quot;Five Minute Load Average&quot;/> </EXTRA_DATA> </METRIC> </METRIC> </HOST> </CLUSTER> </GANGLIA_XML>
  • 31. gmetad (2) $ telnet localhost 8651 < GANGLIA_XML VERSION=&quot;3.1.2&quot; SOURCE=&quot;gmetad&quot;> < GRID NAME=&quot;unspecified&quot; AUTHORITY=&quot;http://localhost/~alex/ganglia/&quot; LOCALTIME=&quot;1258504287&quot;> < CLUSTER NAME=&quot;Debug&quot; LOCALTIME=&quot;1258504282&quot; OWNER=&quot;Alex&quot; LATLONG=&quot;unspecified&quot; URL=&quot;unspecified&quot;> < HOST NAME=&quot;10.0.3.128&quot; IP=&quot;10.0.3.128&quot; REPORTED=&quot;1258504265&quot; TN=&quot;21&quot; TMAX=&quot;20&quot; DMAX=&quot;0&quot; LOCATION=&quot;unspecified&quot; GMOND_STARTED=&quot;1258477064&quot;> <METRIC NAME=&quot;load_five&quot; VAL=&quot;0.15&quot; TYPE=&quot;float&quot; UNITS=&quot; &quot; TN=&quot;30&quot; TMAX=&quot;325&quot; DMAX=&quot;0&quot; SLOPE=&quot;both&quot;> <EXTRA_DATA> <EXTRA_ELEMENT NAME=&quot;GROUP&quot; VAL=&quot;load&quot;/> <EXTRA_ELEMENT NAME=&quot;DESC&quot; VAL=&quot;Five minute load average&quot;/> <EXTRA_ELEMENT NAME=&quot;TITLE&quot; VAL=&quot;Five Minute Load Average&quot;/> </EXTRA_DATA> </METRIC> </METRIC> </HOST> </CLUSTER> </GANGLIA_XML>
  • 32. gmetad (interactive) $ telnet localhost 8652 Connected to localhost.Escape character is '^]' ./cluster_name/host_name/load_five/ ... receive same XML format as normal gmetad port, but limited only to the metric you request ... ... receive same XML format as normal gmetad port, but limited only to the metric you request ... ... receive same XML format as normal gmetad port, but limited only to the metric you request ...
  • 34. Issues It uses reverse DNS lookups to determine hostname => may cause issues in a cloud (need to use workarounds) Doesn't allow arbitrary metric hierarchy (at this time :-))
  • 35. What's next Support for arbitrary grouping beyond clusters using e.g. tags Better Nagios integration New visualization e.g. heatmaps Logstash integration