Monitoring pg with_graphite_grafana

Monitoring PostgreSQL using Time-Series
systems like Graphite and/or Grafana with
OpenCollector
Jan Wieck - OpenSCG

Overview
• What is Monitoring?
• Graphite & Carbon
• Grafana
• Why use Carbon?
• Why use Graphite AND Grafana?
• PostgreSQL Metric Data
• OpenCollector
• osinfofdw

What is Monitoring?
• Capture Time-Series data
• Metric-Name, Value, Timestamp
• Visualize Time-Series data
• Define alerts base on Time-Series data
• Statistical analysis of Time-Series data
• Getting an alert when your primary DB server is
down is covered by the above!

Graphite & Carbon
• Carbon is a server for collecting Time-Series
data
• Simple line based protocol on port 2003
• Python-pickle protocol on port 2004
• Graphite is a WEB based GUI on top of Carbon
• Some Dashboard functionality

Grafana
• Grafana is more Dashboard focused
• Grafana can use many Time-Series data sources
• Graphite
• Elasticsearch
• CloudWatch
• InfluxDB
• OpenTSDB
• KairosDB
• Prometheus

Why use Carbon?
Carbon provides an extremely simple protocol to send
Time-Series data
#!/bin/sh
CHOST=”graphite.host.name”
CPORT=”2003”
METRIC=”test.PI”
VALUE=”3.1415”
echo ”$METRIC $VALUE ‘date +%s‘” | nc $CHOST $CPORT

Why use Carbon?
Not a very useful metric, but consider capturing the
runtime of a shell script based cron job.
Carbon also provides a Python-pickle based protocol
on port 2004 that can be used to send hundreds of
metric points condensed in one send(1).

Why use Graphite AND Grafana?
• Grafana is more Dashboard focused
• Templating makes it easy to define one
Dashboard and use it for many hosts/databases
• Getting to a Dashboard is easier
• Can define Alerts
• Looks cool
• Graphite is better at ad-hoc graphing
• The metric tree is easier to navigate than clicking
through Grafana’s pull down system

However …
This isn’t a talk advertising Graphite or Grafana.
This is a talk about capturing monitoring data from
PostgreSQL and delivering it into a Time-Series data
system. Carbon/Graphite and Grafana are example
destinations.

PostgreSQL Metric Data
PostreSQL produces quite a number of data points.
• On the table level
• about 30 metric points
• On the index level
• On the database level

Those per table/index numbers are not of concern
when you look at your typical benchmark database.
But what about a database with 1,800 tables and
13,000 indexes?
Now we are talking about 132,000 metric points every
time interval! Captured every minute that is 7.9M per
hour, 190M per day, 17.1B per quarter. Don’t do that
with snapshots captured inside the DB.

That isn’t as exotic as it looks at first glance
PostgreSQL system views like pg_stat_all_user_tables
will report every single metric point even if a table or
index hasn’t been used for the past 12 months.
How many dead tables (schemas) does your
database have?
A generic monitoring system can’t tell them apart.

But that isn’t all. Many metrics are presented in what
is a continuous counter, but the useful value is actually
their increase per second.
Examples:
• Tuples inserted, updated, deleted, fetched
• Index/Sequential scans
This is the same as for OS statistics like:
• Network operations
• Disk operations

While that is efficient inside of the PostgreSQL server
for collecting the data, it is rather inconvenient when
browsing it in a system like Graphite or Grafana.
Sure, they can apply a function like “persecond()” and
it is only 20 mouse clicks away …

OpenCollector
• OpenCollector is a PostgreSQL monitoring
daemon sponsored by OpenSCG
• It is designed to address the aforementioned
problems
• JSON configuration files define all the operation
• Target Carbon server
• Source Database(s)
• Queries to run and what metrics they return
• Sparse metric reporting

OpenCollector
An example from the sample configs:
”name”: ”global_stats”,
”prefix”: ”database:{datname}.global_stats”,
”query”: [
”SELECT ”,
” datname, numbackends::float8, ”,
” xact_commit::float8, xact_rollback::float8, ”,
” blks_read::float8, blks_hit::float8, ”,
” pg_catalog.pg_database_size(datid)::float8, ”,
” pg_xlog_location_diff(pg_current_xlog_insert_location(),
”FROM pg_catalog.pg_stat_database ”,
”WHERE datname = current_database() ”
],
”result”: [
{ ”name”: ”datname”, ”type”: ”internal” },
{ ”name”: ”numbackends”, ”type”: ”value” },
{ ”name”: ”xact_commit”, ”type”: ”counter” },
...
]

OpenCollector
• Since the queries are in config files, you can
customize them
• Additional WHERE clauses
• Change from pg_stat_all_ to pg_stat_user_
• Add your own, application specific queries
• OpenCollector is modular and allows to add
other things
• OpenCollector is open source

osinfofdw
• osinfofdw is another open source project
sponsored by OpenSCG
• A MultiCorn based FDW around Python-psutil
• Access OS level statistics via SELECT
• CPU usage
• Memory usage
• Disk IO
• Network IO
• Filesystem information

Links
Links:
• https://bitbucket.org/openscg/opencollector
• https://bitbucket.org/openscg/osinfofdw
Questions?

Monitoring pg with_graphite_grafana

More Related Content

What's hot (20)

Viewers also liked (8)

Similar to Monitoring pg with_graphite_grafana (20)

Recently uploaded (20)

Monitoring pg with_graphite_grafana