SlideShare a Scribd company logo
Monitoring the Hashistack
with Prometheus
Tom Wilkie @tom_wilkie

August 2018
❤
Prometheus
● A monitoring & alerting system.

● Inspired by Google’s BorgMon

● Originally built by SoundCloud in 2012

● Open Source, now part of the CNCF

● Simple text-based metrics format

● Multidimensional datamodel

● Rich, concise query language
Monitoring the Hashistack with Prometheus
Monitoring the Hashistack with Prometheus
Prometheus’ data model is very simple:

<identifier> → [ (t0, v0), (t1, v1), ... ]
Timestamps are millisecond int64, values are float64
https://www.slideshare.net/Docker/monitoring-the-prometheus-way-julius-voltz-prometheus
Prometheus identifiers

http_requests_total{job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“200”}
http_requests_total{job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“500”}
http_requests_total{job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“200”}
http_requests_total{job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“502”}
Prometheus series selector

http_requests_total{job=“nginx”, status=~“5..”}
Building queries usually starts with a selector

PromQL: http_requests_total{job=“nginx”, status=~“5..”}
{job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“500”} 34
{job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“502”} 56
{job=“nginx”, instances=“2.3.4.5:80”, path=“/home”, status=“500”} 76
{job=“nginx”, instances=“2.3.4.5:80”, path=“/setting”, status=“502”} 96
...
Can select vectors of values…

PromQL: http_requests_total{job=“nginx”, status=~“502”}[1m]
{job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“500”} [30, 31, 32, 34]
{job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“500”} [4, 24, 56, 56]
{job=“nginx”, instances=“2.3.4.5:80”, path=“/home”, status=“500”} [76, 76, 76, 76]
{job=“nginx”, instances=“2.3.4.5:80”, path=“/setting”, status=“500”} [56, 106, 5, 96]
...
And apply functions…

PromQL: rate(http_requests_total{job=“nginx”, status=~“502”}[1m])
{job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“500”} 0.0666
{job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“500”} 0.866
{job=“nginx”, instances=“2.3.4.5:80”, path=“/home”, status=“500”} 0.0
{job=“nginx”, instances=“2.3.4.5:80”, path=“/settings”, status=“500”} 2.43
...
And aggregate by a dimension…

PromQL: sum by (path) (rate(http_requests_total{job=“nginx”, status=~“502”}[1m]))
{path=“/home”} 0.0666
{path=“/settings”} 3.3
...
Do binary operations…

PromQL: sum by (path) (rate(http_requests_total{job=“nginx”, status=~“502”}[1m]))
/
sum by (path) (rate(http_requests_total{job=“nginx”}[1m]))
{path=“/home”} 0.001
{path=“/settings”} 1.0
...
Hashistack
● Consul: service discovery, K/V
store, service mesh…
● Vault: secret management and
automation
● Terraform: infrastructure config
as code
• All Hashicorp products use github.com/armon/go-metrics for metrics.

• Exposes metrics in statsd, dogstatsd

• Prometheus support added in 2015

• …but not plumbed through in most products (until recently)
Consul
● Prometheus support exposed in
1.1.0 (hashicorp/consul#4014,
hashicorp/consul#4016)

● Exposed metrics are being improved
(hashicorp/consul#4042)
Consul (II)
● Alternatively use the statsd_exporter
with customer metrics mapping

● github.com/prometheus/
statsd_exporter
https://github.com/kausalco/public/blob/master/consul-mixin/statsd-mapping.yaml
Consul (III)
● Still don’t get the “operational”
metrics we want.

● Use the consul_exporter:
github.com/prometheus/
consul_exporter
Consul Mixin
● Set of predefined alerts &
dashboards for Prometheus /
Grafana

● github.com/kausalco/
public/consul-mixin

● Users “Prometheus Mixin”
format:

● Design Doc
Vault
● No Prometheus metrics yet
(hashicorp/vault#2937)

● Again, use statsd_exporter

● BUT statsd metrics aren’t very
“operational”

● Use vault_exporter

● github.com/grapeshot/
vault_exporter
Vault Mixin
● As per consul mixin, set of alerts
and dashboard for Prometheus
& Grafana

● github.com/grapeshot/
vault_exporter/vault-mixin
Terraform
● Terraform is a CLI tool - what
does it mean to monitor that?

● I don’t have enough confidence
to run it CI/CD..

● But I do want to know if
someone changes something
and doesn’t apply it.
Terradiff
• Runs on k8s

• Pod containing:

• git-sync (github.com/kubernetes/git-sync)

• prom-run (github.com/tomwilkie/prom-run)

terraform plan -detailed-exitcode
https://www.weave.works/blog/provisioning-lifecycle-production-ready-kubernetes-cluster/
Hashistack
● Consul: service discovery, K/V
store, service mesh…
● Vault: secret management and
automation
● Terraform: infrastructure config
as code
Questions?

More Related Content

PDF
Monitoring with Prometheus
PDF
Oracle Database In-Memory Advisor (English)
PDF
M|18 Architectural Overview: MariaDB MaxScale
PPTX
PHP Course (Basic to Advance)
PPSX
Php and MySQL
PDF
Inside Parquet Format
PDF
Data Security at Scale through Spark and Parquet Encryption
PDF
eBPF - Observability In Deep
Monitoring with Prometheus
Oracle Database In-Memory Advisor (English)
M|18 Architectural Overview: MariaDB MaxScale
PHP Course (Basic to Advance)
Php and MySQL
Inside Parquet Format
Data Security at Scale through Spark and Parquet Encryption
eBPF - Observability In Deep

What's hot (20)

PPTX
Prometheus design and philosophy
PDF
Cloud Monitoring with Prometheus
PDF
Storing 16 Bytes at Scale
PDF
Extending kubernetes with CustomResourceDefinitions
PDF
Hands-On Introduction to Kubernetes at LISA17
PDF
Monitoring kubernetes with prometheus
PDF
Kubernetes Security Best Practices - With tips for the CKS exam
PPTX
MySQL Slow Query log Monitoring using Beats & ELK
PDF
High Availability PostgreSQL with Zalando Patroni
PDF
Percona Live 2022 - MySQL Architectures
PPTX
Kubernetes Basics
PDF
Postgresql database administration volume 1
PPTX
Airflow presentation
PDF
Altinity Quickstart for ClickHouse-2202-09-15.pdf
PDF
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
PPT
Oracle database - Get external data via HTTP, FTP and Web Services
PPTX
Grafana
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
PPTX
PostgreSQL Database Slides
Prometheus design and philosophy
Cloud Monitoring with Prometheus
Storing 16 Bytes at Scale
Extending kubernetes with CustomResourceDefinitions
Hands-On Introduction to Kubernetes at LISA17
Monitoring kubernetes with prometheus
Kubernetes Security Best Practices - With tips for the CKS exam
MySQL Slow Query log Monitoring using Beats & ELK
High Availability PostgreSQL with Zalando Patroni
Percona Live 2022 - MySQL Architectures
Kubernetes Basics
Postgresql database administration volume 1
Airflow presentation
Altinity Quickstart for ClickHouse-2202-09-15.pdf
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Oracle database - Get external data via HTTP, FTP and Web Services
Grafana
Where is my bottleneck? Performance troubleshooting in Flink
PostgreSQL Database Slides
Ad

Similar to Monitoring the Hashistack with Prometheus (20)

PDF
Monitoring Kubernetes with Prometheus
PDF
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
PDF
Monitoring Kubernetes with Prometheus
KEY
Django deployment with PaaS
PPTX
MuleSoft Meetup Roma - Processi di Automazione su CloudHub
PDF
Prometheus - basics
PPTX
OpenTelemetry 101 FTW
PPTX
Docker practical solutions
PDF
Monitoring Cloud Native Applications with Prometheus
PDF
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
PDF
Apache Eagle: Secure Hadoop in Real Time
PDF
Apache Eagle at Hadoop Summit 2016 San Jose
PDF
Uni w pachube 111108
PDF
DevOps Braga #15: Agentless monitoring with icinga and prometheus
PDF
FIWARE Wednesday Webinars - Short Term History within Smart Systems
PDF
Monitoring Kubernetes with Prometheus
PDF
Monitoring at scale: Migrating to Prometheus at Fastly
PDF
OSMC 2018 | Thruk 2½ – Current state of development by Sven Nierlein
PDF
Prometheus (Microsoft, 2016)
PPTX
openATTIC using grafana and prometheus
Monitoring Kubernetes with Prometheus
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring Kubernetes with Prometheus
Django deployment with PaaS
MuleSoft Meetup Roma - Processi di Automazione su CloudHub
Prometheus - basics
OpenTelemetry 101 FTW
Docker practical solutions
Monitoring Cloud Native Applications with Prometheus
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle at Hadoop Summit 2016 San Jose
Uni w pachube 111108
DevOps Braga #15: Agentless monitoring with icinga and prometheus
FIWARE Wednesday Webinars - Short Term History within Smart Systems
Monitoring Kubernetes with Prometheus
Monitoring at scale: Migrating to Prometheus at Fastly
OSMC 2018 | Thruk 2½ – Current state of development by Sven Nierlein
Prometheus (Microsoft, 2016)
openATTIC using grafana and prometheus
Ad

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPT
Teaching material agriculture food technology
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Approach and Philosophy of On baking technology
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Machine learning based COVID-19 study performance prediction
20250228 LYD VKU AI Blended-Learning.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Electronic commerce courselecture one. Pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
“AI and Expert System Decision Support & Business Intelligence Systems”
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Teaching material agriculture food technology
Encapsulation_ Review paper, used for researhc scholars
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Approach and Philosophy of On baking technology
Programs and apps: productivity, graphics, security and other tools
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Advanced methodologies resolving dimensionality complications for autism neur...
MIND Revenue Release Quarter 2 2025 Press Release
Understanding_Digital_Forensics_Presentation.pptx
Unlocking AI with Model Context Protocol (MCP)
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Spectral efficient network and resource selection model in 5G networks
Machine learning based COVID-19 study performance prediction

Monitoring the Hashistack with Prometheus

  • 1. Monitoring the Hashistack with Prometheus Tom Wilkie @tom_wilkie August 2018 ❤
  • 2. Prometheus ● A monitoring & alerting system. ● Inspired by Google’s BorgMon ● Originally built by SoundCloud in 2012 ● Open Source, now part of the CNCF ● Simple text-based metrics format ● Multidimensional datamodel ● Rich, concise query language
  • 5. Prometheus’ data model is very simple: <identifier> → [ (t0, v0), (t1, v1), ... ] Timestamps are millisecond int64, values are float64 https://www.slideshare.net/Docker/monitoring-the-prometheus-way-julius-voltz-prometheus
  • 6. Prometheus identifiers http_requests_total{job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“200”} http_requests_total{job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“500”} http_requests_total{job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“200”} http_requests_total{job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“502”} Prometheus series selector http_requests_total{job=“nginx”, status=~“5..”}
  • 7. Building queries usually starts with a selector PromQL: http_requests_total{job=“nginx”, status=~“5..”} {job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“500”} 34 {job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“502”} 56 {job=“nginx”, instances=“2.3.4.5:80”, path=“/home”, status=“500”} 76 {job=“nginx”, instances=“2.3.4.5:80”, path=“/setting”, status=“502”} 96 ...
  • 8. Can select vectors of values… PromQL: http_requests_total{job=“nginx”, status=~“502”}[1m] {job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“500”} [30, 31, 32, 34] {job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“500”} [4, 24, 56, 56] {job=“nginx”, instances=“2.3.4.5:80”, path=“/home”, status=“500”} [76, 76, 76, 76] {job=“nginx”, instances=“2.3.4.5:80”, path=“/setting”, status=“500”} [56, 106, 5, 96] ...
  • 9. And apply functions… PromQL: rate(http_requests_total{job=“nginx”, status=~“502”}[1m]) {job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“500”} 0.0666 {job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“500”} 0.866 {job=“nginx”, instances=“2.3.4.5:80”, path=“/home”, status=“500”} 0.0 {job=“nginx”, instances=“2.3.4.5:80”, path=“/settings”, status=“500”} 2.43 ...
  • 10. And aggregate by a dimension… PromQL: sum by (path) (rate(http_requests_total{job=“nginx”, status=~“502”}[1m])) {path=“/home”} 0.0666 {path=“/settings”} 3.3 ...
  • 11. Do binary operations… PromQL: sum by (path) (rate(http_requests_total{job=“nginx”, status=~“502”}[1m])) / sum by (path) (rate(http_requests_total{job=“nginx”}[1m])) {path=“/home”} 0.001 {path=“/settings”} 1.0 ...
  • 12. Hashistack ● Consul: service discovery, K/V store, service mesh… ● Vault: secret management and automation ● Terraform: infrastructure config as code
  • 13. • All Hashicorp products use github.com/armon/go-metrics for metrics. • Exposes metrics in statsd, dogstatsd • Prometheus support added in 2015 • …but not plumbed through in most products (until recently)
  • 14. Consul ● Prometheus support exposed in 1.1.0 (hashicorp/consul#4014, hashicorp/consul#4016) ● Exposed metrics are being improved (hashicorp/consul#4042)
  • 15. Consul (II) ● Alternatively use the statsd_exporter with customer metrics mapping ● github.com/prometheus/ statsd_exporter
  • 17. Consul (III) ● Still don’t get the “operational” metrics we want. ● Use the consul_exporter: github.com/prometheus/ consul_exporter
  • 18. Consul Mixin ● Set of predefined alerts & dashboards for Prometheus / Grafana ● github.com/kausalco/ public/consul-mixin ● Users “Prometheus Mixin” format: ● Design Doc
  • 19. Vault ● No Prometheus metrics yet (hashicorp/vault#2937) ● Again, use statsd_exporter ● BUT statsd metrics aren’t very “operational” ● Use vault_exporter ● github.com/grapeshot/ vault_exporter
  • 20. Vault Mixin ● As per consul mixin, set of alerts and dashboard for Prometheus & Grafana ● github.com/grapeshot/ vault_exporter/vault-mixin
  • 21. Terraform ● Terraform is a CLI tool - what does it mean to monitor that? ● I don’t have enough confidence to run it CI/CD.. ● But I do want to know if someone changes something and doesn’t apply it.
  • 22. Terradiff • Runs on k8s • Pod containing: • git-sync (github.com/kubernetes/git-sync) • prom-run (github.com/tomwilkie/prom-run) terraform plan -detailed-exitcode https://www.weave.works/blog/provisioning-lifecycle-production-ready-kubernetes-cluster/
  • 23. Hashistack ● Consul: service discovery, K/V store, service mesh… ● Vault: secret management and automation ● Terraform: infrastructure config as code