Ganglia Overview-v2

The Ganglia Monitoring Framework Vladimir Vuksan, June 2011 http://deanspot.org/content/ganglia-references

About this talk Ganglia architecture How to get metrics in. How to get metrics out.

Getting Ganglia tar xzvf ganglia-3.1.7.tar.gz ./configure --with-gmetad make make install Or use binary packages ie. Ubuntu/Debian : apt-get install ganglia-monitor gmetad Fedora : yum install ganglia-gmond ganglia-gmetad

Ganglia Architecture 2 daemons: gmond & gmetad gmond collects or receives metric data on each node 1 gmetad per grid. polls 1 gmond per cluster for data. a node belongs to a cluster. a cluster belongs to a grid. Web UI a separate item use it or lose it

Custom Graphs { { "report_name" : "network_report", "report_type" : "standard", "title" : "Network", "vertical_label" : "Bytes/sec", "series" : [ { "metric": "bytes_in", "color": "33cc33", "label": "In", "line_width": "2", "type": "line" }, { "metric": "bytes_out", "color": "5555cc", "label": "Out", "line_width": "2", "type": "line" } ] }

Replaces this: DEF:'a0'='/var/lib/ganglia/g1/rrds/c-1-2/__SummaryInfo__/bytes_in.rrd':'sum':AVERAGE LINE2:'a0'#33cc33:'In ' VDEF:a0_last=a0,LAST VDEF:a0_min=a0,MINIMUM VDEF:a0_avg=a0,AVERAGE VDEF:a0_max=a0,MAXIMUM GPRINT:'a0_last':'Now\\:%5.1lf%s' GPRINT:'a0_min':'Min\\:%5.1lf%s' GPRINT:'a0_avg':'Avg\\:%5.1lf%s' GPRINT:'a0_max':'Max\\:%5.1lf%s\\l' DEF:'a1'='/var/lib/ganglia/g1/rrds/c-1-2/__SummaryInfo__/bytes_out.rrd':'sum':AVERAGE LINE2:'a1'#5555cc:'Out' VDEF:a1_last=a1,LAST VDEF:a1_min=a1,MINIMUM VDEF:a1_avg=a1,AVERAGE VDEF:a1_max=a1,MAXIMUM GPRINT:'a1_last':'Now\\:%5.1lf%s' GPRINT:'a1_min':'Min\\:%5.1lf%s' GPRINT:'a1_avg':'Avg\\:%5.1lf%s' GPRINT:'a1_max':'Max\\:%5.1lf%s\\l'

gmetad creates 1 RRD file for each metric. default retention schedule is defined in gmetad.conf Store an avg every For This Long 15 sec 60 min 6 min 1 day 42 min 1 week 168 min 30 days 1 day 1 year

Default schedule fits into 12K per metric -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 bytes_in.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 bytes_out.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_aidle.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_idle.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_nice.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_num.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_speed.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_system.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_user.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 cpu_wio.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 disk_free.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 disk_total.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 load_fifteen.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 load_five.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 load_one.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 mem_buffers.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 proc_run.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 proc_total.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 swap_free.rrd -rw-rw-rw- 1 nobody root 12224 Jun 16 04:07 swap_total.rrd

If you need more resolution adjust it in gmetad conf Getting started with RRD: http://oss.oetiker.ch/rrdtool/tut/rrd-beginners.en.html More help defining RRD files: http://www.cuddletech.com/articles/rrd/ar01s02.html A little more about how RRD works with Ganglia: http://vuksan.com/blog/2010/12/14/misconceptions-about-rrd-storage/

Getting data in Via gmond modules, written in C or Python. Via gmetric or libraries that implement the gmetric protocol. Via other daemons designed to feed metrics to ganglia (e.g. sFlow)

Zero configuration. Just start sending new metrics. gmetad will create a new RRD file for any new metric it sees. The web UI will draw a basic graph for every metric. You can create nice colored graphs later if you want them.

gmond module modules { module { name = "net_module" path = "modnet.so" } } collection_group { collect_every = 40 time_threshold = 300 metric { name = "bytes_out" value_threshold = 4096 title = "Bytes Sent" } metric { name = "bytes_in" value_threshold = 4096 title = "Bytes Received" } }

gmetric $ gmetric -c /etc/ganglia/gmond.conf –name=foo \ --value=512 --units=foos --type=uint8 --dmax=60 CLI Ruby

What kind of metrics can I collect? Load time of your home page? Number of active trouble tickets? LOC in your application? rcov coverage statistics? Execution time of your test suite? Number of user logins? Memory usage by a particular process? Many other metric plugins available at http://github.com/ganglia

Log parsing apps Eat log files and make metrics http://vuksan.com/linux/ganglia/# Apache_Traffic_Stats ganglia-logtailer https://bitbucket.org/maplebed/ganglia-logtailer logster https://github.com/etsy/logster

But I thought Ganglia is only useful for host metrics? No You can create “non-existent” hosts by spoofing

Integrating Ganglia & Nagios $ check_ganglia_metric.py --gmetad_host=gmetad-server.example.com \ --metric_host=host.example.com --metric_name=cpu_idle --critical=99 Status Critical, CPU Idle = 99.6 %|cpu_idle=99.6%;;99;; https://github.com/mconigliaro/check_ganglia_metric http://vuksan.com/blog/2011/04/19/use-your-trending-data-for-alerting/

Getting Data Out Web UI JSON & CSV export gmond XML gmetad XML gmetad interactive

gmond $ telnet localhost 8649 <GANGLIA_XML VERSION="3.1.2" SOURCE="gmond"> <CLUSTER NAME="Debug" LOCALTIME="1258504343" OWNER="Alex"> <HOST NAME="10.0.3.128" IP="10.0.3.128" REPORTED="1258504325" TN="18" TMAX="20" DMAX="0"> <METRIC NAME="load_five" VAL="0.15" TYPE="float" UNITS=" " TN="30" TMAX="325" DMAX="0" SLOPE="both"> <EXTRA_DATA> <EXTRA_ELEMENT NAME="GROUP" VAL="load"/> <EXTRA_ELEMENT NAME="DESC" VAL="Five minute load average"/> <EXTRA_ELEMENT NAME="TITLE" VAL="Five Minute Load Average"/> </EXTRA_DATA> </METRIC> </HOST> </CLUSTER> </GANGLIA>

gmond $ telnet localhost 8649 < GANGLIA_XML VERSION="3.1.2" SOURCE="gmond"> < CLUSTER NAME="Debug" LOCALTIME="1258504343" OWNER="Alex"> < HOST NAME="10.0.3.128" IP="10.0.3.128" REPORTED="1258504325" TN="18" TMAX="20" DMAX="0"> <METRIC NAME="load_five" VAL="0.15" TYPE="float" UNITS=" " TN="30" TMAX="325" DMAX="0" SLOPE="both"> <EXTRA_DATA> <EXTRA_ELEMENT NAME="GROUP" VAL="load"/> <EXTRA_ELEMENT NAME="DESC" VAL="Five minute load average"/> <EXTRA_ELEMENT NAME="TITLE" VAL="Five Minute Load Average"/> </EXTRA_DATA> </METRIC> </HOST> </CLUSTER> </GANGLIA>

gmetad (non-interactive) $ telnet localhost 8651 <GANGLIA_XML VERSION="3.1.2" SOURCE="gmetad"> <GRID NAME="unspecified" AUTHORITY="http://localhost/~alex/ganglia/" LOCALTIME="1258504287"> <CLUSTER NAME="Debug" LOCALTIME="1258504282" OWNER="Alex" LATLONG="unspecified" URL="unspecified"> <HOST NAME="10.0.3.128" IP="10.0.3.128" REPORTED="1258504265" TN="21" TMAX="20" DMAX="0" LOCATION="unspecified" GMOND_STARTED="1258477064"> <METRIC NAME="load_five" VAL="0.15" TYPE="float" UNITS=" " TN="30" TMAX="325" DMAX="0" SLOPE="both"> <EXTRA_DATA> <EXTRA_ELEMENT NAME="GROUP" VAL="load"/> <EXTRA_ELEMENT NAME="DESC" VAL="Five minute load average"/> <EXTRA_ELEMENT NAME="TITLE" VAL="Five Minute Load Average"/> </EXTRA_DATA> </METRIC> </METRIC> </HOST> </CLUSTER> </GANGLIA_XML>

gmetad (2) $ telnet localhost 8651 < GANGLIA_XML VERSION="3.1.2" SOURCE="gmetad"> < GRID NAME="unspecified" AUTHORITY="http://localhost/~alex/ganglia/" LOCALTIME="1258504287"> < CLUSTER NAME="Debug" LOCALTIME="1258504282" OWNER="Alex" LATLONG="unspecified" URL="unspecified"> < HOST NAME="10.0.3.128" IP="10.0.3.128" REPORTED="1258504265" TN="21" TMAX="20" DMAX="0" LOCATION="unspecified" GMOND_STARTED="1258477064"> <METRIC NAME="load_five" VAL="0.15" TYPE="float" UNITS=" " TN="30" TMAX="325" DMAX="0" SLOPE="both"> <EXTRA_DATA> <EXTRA_ELEMENT NAME="GROUP" VAL="load"/> <EXTRA_ELEMENT NAME="DESC" VAL="Five minute load average"/> <EXTRA_ELEMENT NAME="TITLE" VAL="Five Minute Load Average"/> </EXTRA_DATA> </METRIC> </METRIC> </HOST> </CLUSTER> </GANGLIA_XML>

gmetad (interactive) $ telnet localhost 8652 Connected to localhost.Escape character is '^]' ./cluster_name/host_name/load_five/ ... receive same XML format as normal gmetad port, but limited only to the metric you request ... ... receive same XML format as normal gmetad port, but limited only to the metric you request ... ... receive same XML format as normal gmetad port, but limited only to the metric you request ...

Issues It uses reverse DNS lookups to determine hostname => may cause issues in a cloud (need to use workarounds) Doesn't allow arbitrary metric hierarchy (at this time :-))

What's next Support for arbitrary grouping beyond clusters using e.g. tags Better Nagios integration New visualization e.g. heatmaps Logstash integration

Ganglia Overview-v2

More Related Content

What's hot (20)

Viewers also liked (18)

Similar to Ganglia Overview-v2 (20)

More from Chris Westin (20)

Recently uploaded (20)

Ganglia Overview-v2