SlideShare a Scribd company logo
https://bit.ly/2Cs2ql4
Thank you
Hosted by Sponsored by
Today
● Agentless monitoring with Icinga and Prometheus by Diogo Machado
● Coffee break and networking
Agentless monitoring with Icinga and
Prometheus
Diogo Machado
dgm@eurotux.com
04/11/2019
DevOps Braga #15
Agenda
● From Icinga to Prometheus
● Prometheus Basic Concepts
● Prometheus Server Configuration
● Getting data into Prometheus
● Implement custom metrics
● How to integrate Icinga with Prometheus?
From Icinga to Prometheus - Introduction
● Open-source computer system and network
monitoring application;
● Monitor the availability of hosts and
services;
● Distributed Monitoring;
● Agent Based Monitoring;
● Notifications and Downtimes;
● Written in PHP and C++;
● Open-source systems monitoring and
alerting toolkit;
● Monitoring of highly service-oriented
architectures;
● Collect metrics with Exporters;
● Store aggregated metrics;
● Written in Go;
From Icinga to Prometheus - Comparison
● Alerting based on the exit codes of Checks;
● Host-based;
● There is no notion of labels or query language.
● No storage per-se, beyond the check state;
● All configuration are made via file;
● Monitoring of small and/or static systems
where blackbox probing is sufficient.
● Multi-dimensional data model with time
series data;
● Rule-based alerts;
● Prometheus Query Language - PromQL;
● Centralized data store;
● Suitable for dynamic or cloud based
environment - Whitebox monitoring;
From Icinga to Prometheus - Comparison
If there are both to system monitoring….
Why not choose only one?
What does Prometheus have that Icinga doesn't have?
Why should we combine both?
From Icinga to Prometheus - Conclusion
So … Why should we combine both?
● Prometheus have:
○ Exporters to third-party systems and applications;
○ Centralized control and HTTP API;
● Icinga have:
○ Easy configuration of host and services (Icinga Director);
○ Good alerting system with notifications and schedule downtimes;
Combine to get the best of each one:
● Scrape metrics with Prometheus
● Configure and Alert with Icinga
Prometheus's main features are:
● multi-dimensional data model with time series data identified by metric name and key/value pairs;
● PromQL, a flexible query language to leverage this dimensionality;
● no reliance on distributed storage - single server nodes are autonomous;
● time series collection happens via a pull model over HTTP;
● pushing time series is supported via an intermediary gateway;
● targets are discovered via service discovery or static configuration;
● support for graph and dashboard mode.
Prometheus Basic Concepts
Prometheus Basic Concepts - Architecture
The Prometheus ecosystem consists of
multiple components:
● the main Prometheus server which
scrapes and stores time series data;
● a Pushgateway for supporting
short-lived jobs;
● specific exporters for services like
HAProxy, StatsD, Graphite, etc;
● an Alertmanager to handle alerts;
● data visualization tools like Grafana;
Prometheus Basic Concepts - Data Model
● Data is stored as time series: streams of
timestamped values belonging to the same
metric and the same set of labeled dimensions;
● Key Value Data Model:
○ Key - Metric name and a set of labels;
○ Value - Metric measuring;
Prometheus Basic Concepts - Metric Types
Counter
● Cumulative metric;
● Monotonically increasing
counter;
● Examples:
○ Nº requests served;
○ Nº tasks completed;
○ Number of errors;
Gauge
● Numerical value that can
arbitrarily go up or down;
● Examples:
○ Temperature;
○ Memory usage;
○ Nº concurrent requests;
Histogram
● Values are aggregated in
buckets;
● Expose total sum of all
observed values;
● Count of events that have
been observed;
● Example:
○ Request latency;
Summary
● Similar to a histogram;
● Calculates configurable
quantiles over a sliding
time window;
● Examples:
○ Request durations;
○ Response sizes;
requests_time_seconds_bucket{app=”projectx”,le=”0.005"} 2343340162
… (buckets)
requests_time_seconds_sum{app=”projectx”} 5.366133242442994e+07
requests_time_seconds_count{app=”projectx”} 3973861256
go_gc_duration_seconds{quantile="0"} 4.274e-05
… (quantiles)
go_gc_duration_seconds_sum 0.467543895
go_gc_duration_seconds_count 92
Histogram Summary
Prometheus Basic Concepts - Jobs and Instances
● An endpoint you can scrape is called an Instance;
● A collection of instances with the same purpose is called a Job;
● Prometheus scrapes a Target and attaches labels automatically: job name, instance host and port;
● Example of job with 3 instances:
job: 1:
instance 1: 128.0.0.1:9030
instance 2: 128.0.0.2:9030
instance 3: 128.0.0.3:9030
Prometheus Basic Concepts - PromQL
Expression language data types:
● Instant vector - a set of time series containing a single sample for each time series;
○ Example: http_requests_total{environment=~"staging|development",method!="GET"}
● Range vector - a set of time series containing a range of data points for each time series;
○ Example: http_requests_total{job="prometheus"}[5m]
● Scalar - a simple numeric floating point value;
○ Example: -2.43
● String - a simple string value;
○ Example: 'these are unescaped: n  t'
Prometheus Basic Concepts - REST API
● Response format is JSON: ● Methods: GET, POST
● Endpoints:
○ /api/v1/query
○ /api/v1/query_range
○ /api/v1/series
○ /api/v1/labels
○ /api/v1/targets
○ /api/v1/rules
○ /api/v1/targets/metadata
○ /api/v1/status/config
○ /api/v1/status/flags
Prometheus Basic Concepts - Scrape Metrics
● Prometheus works essentially pulling metrics metadata from targets;
● Pulling over HTTP offers a number of advantages:
○ Easily tell if a target is down;
○ Manually inspect target health with a web browser.
● However, push model can be implemented with Pushgateway:
○ Intermediary service which allows to push metrics from jobs that cannot be scraped;
○ Disadvantages:
■ Pushgateway becomes both a single point of failure (SPOF) and a potential
bottleneck;
■ Lose Prometheus's automatic instance health monitoring via the UP metric.
Prometheus Server - Installation
● Using pre-compiled binaries;
● From source (Makefile);
● Using Docker (Quay.io or Docker Hub)
● Using configuration management systems:
○ Ansible
○ Chef
○ Puppet
Docker command:
docker run -p 9090:9090 -v /prometheus-data 
prom/prometheus --config.file=prometheus.yml
Dockerfile:
FROM prom/prometheus
ADD prometheus.yml /etc/prometheus/
Build:
docker build -t my-prometheus .
docker run -p 9090:9090 my-prometheus
Prometheus Server - Configuration
● Prometheus is configured via command-line flags and
a configuration file: prometheus.yml;
● Prometheus default port is 9090;
● Prometheus can reload its configuration at runtime:
○ SIGHUP to the Prometheus process;
○ HTTP POST request to the “/-/reload” endpoint (when
the --web.enable-lifecycle flag is enabled).
● Recording rules and Alerting rules should be written in
a YAML file (rule_files);
● It’s possible to use service discovery mechanism to
automatically update scrape target list;
global:
scrape_interval: 15s
scrape_timeout: 10s
evaluation_interval: 15s
external_labels:
environment: tst
rule_files:
- /opt/prometheus/rules/*.rules
scrape_configs:
- job_name: node
scrape_interval: 30s
metrics_path: /metrics
scheme: http
ec2_sd_configs:
- endpoint: ""
region: eu-west-3
refresh_interval: 1m
port: 9100
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
scheme: http
timeout: 10s
api_version: v1
Prometheus Server
Configuration
Consult Prometheus
Configuration via Web browser
Prometheus Server
Targets and Rules
Targets and Rules defined on
Prometheus configuration
Getting data into Prometheus
● Data can be export to Prometheus using Pushgateway or via Exporter;
● Prometheus have multiple Exporters, of which stand out:
○ Node/system metrics exporter;
○ MySQL server exporter;
○ Memcached exporter;
○ Kafka exporter;
○ HAProxy exporter;
○ Tomcat exporter;
○ AWS CloudWatch exporter;
● Each Exporter has a default port allocated and a route (‘/metrics’’);
● Besides official and community Exporters, it’s possible to write custom Exporters;
● Most exporters can be run as a service or using Docker;
● Pushgateway isn’t an event store but an intermediary service to push metrics from jobs that can’t be
scraped;
● Deploy using binary or Docker image:
○ Example: docker run -d -p 9091:9091 prom/pushgateway
● To change the listen address, use the “--web.listen-address” flag;
● By default, Pushgateway doesn’t persist metrics. To cache them, use “--persistence.file” option;
● All pushes are done via HTTP. The interface is REST-like. Metrics are available on ‘/metrics’ route ;
● Examples:
○ echo "some_metric 3.14" | curl --data-binary @- http://localhost:9091/metrics/job/job1
○ curl -X DELETE http://localhost:9091/metrics/job/job1
Getting data into Prometheus - Pushgateway
Getting data into Prometheus - Node Exporter
● Expose metrics of Hardware and Operating System based on collectors, usually running on port 9100;
● By default, there is a specific set of collectors for each operating system: cpu, diskstats, filesystem,
netstat, nfs, textfile, …
● Others can be enabled with collector option: ntp, processes, systemd, …
○ Example: --collector.processes
● Exporter can be run using third-party repository for RHEL/CentOS/Fedora (Corp), using the source
code or using Docker;
○ Example:
docker run -d --net=”host” --pid=”host” -v “/:/host:ro” prom/node-exporter --path.rootfs=/host --collector.ntp
--collector.processes --collector.textfile.directory /var/lib/node_exporter/textfile_collector --cap-add=SYS_TIME
Implement custom metrics on Node Exporter
● Custom metrics can be implemented with textfile collector;
● Textfile collector is similar to Pushgateway, in that it allows exporting of statistics from batch jobs;
● Pushgateway should be used for service-level metrics, while textfile should be used for machine metrics;
● The collector will parse all files in textfile directory matching the glob *.prom using the text format;
● Example to automatically push logged in users on machine:
echo node_users_logged_in $(who /host/var/run/utmp | wc -l) > /var/lib/node_exporter/textfile_collector/users.prom.$$
mv /var/lib/node_exporter/textfile_collector/users.prom.$$ /var/lib/node_exporter/textfile_collector/users.prom
Getting data into
Prometheus - Node
Exporter
Example of metrics available at
endpoint “/metrics”
Prometheus query
Node Exporter Data
Example of a query in
PromQL to check CPU usage
How to integrate Icinga with Prometheus?
● Icinga can be integrated with Prometheus via Nagitheus (Claranet);
● Nagitheus is a Nagios plugin for querying Prometheus, written in Go;
● Nagitheus process vector or scalar results and return an exit code, according with warning/critical
values and comparison method (ge, gt, le, lt);
● Allows basic authentication on Prometheus with username and password (-u and -pw options);
● Example:
/usr/lib64/nagios/plugins/nagitheus -H http://localhost:9090 -i 10.0.0.10 -p 9100 -l 'Check CPU' -d yes -q
'(avg by (mode) (irate(node_cpu_seconds_total{instance="", mode!="idle"}[5m])) * 100)' -m ge -w 70 -c 80
How to integrate Icinga with Prometheus?
● Basic Linux Checks:
○ CPU
○ Disk
○ Load
○ Memory
○ Total procs
○ Ntp
● Steps to implement a check:
○ Identify metrics to use on query;
○ Create query considering PromQL operators and metric types;
○ Test query on Promehteus and define label, method and critical/warning values;
○ Implement icinga command using the query and values specified for each option.
How to integrate Icinga with Prometheus?
● Check Disk (Percentage of disk free):
1. Metrics to use on check:
○ node_filesystem_free_bytes
○ node_filesystem_size_bytes
2. Data types: Instant vectores;
3. PromQL operators:
○ / (division)
○ * (multiplication)
4. Query:
(node_filesystem_free_bytes / node_filesystem_size_bytes{fstype!~"none|tmpfs|sysfs", mountpoint=~"/var/.*", instance=”172.27.68.163:9030”} )* 100
5. Analyze query result and define method, critical and warning value
# HELP node_filesystem_free_bytes Filesystem free space in bytes.
# TYPE node_filesystem_free_bytes gauge
node_filesystem_free_bytes{device="shm",fstype="ext4",mountpoint="/var"} 1.01808578e+10
…
# HELP node_filesystem_size_bytes Filesystem size in bytes.
# TYPE node_filesystem_size_bytes gauge
node_filesystem_size_bytes{device="shm",fstype="ext4",mountpoint="/var"} 1.05017712e+10
…
How to integrate Icinga with Prometheus?
● Query Result
Method: le Warning: 20% Critical:10%
● Icinga Command:
/usr/lib64/nagios/plugins/nagitheus -H http://localhost:9090 -i 172.27.68.163 -p 9030 -l 'Check disk' -d yes -m le -w 20 -c 10 -q
'(node_filesystem_free_bytes / node_filesystem_size_bytes{fstype!~"none|tmpfs|sysfs", mountpoint=~"/var/.*"} )* 100'
Warning
(20) Critical
(10)
Questions?
DevOps Braga #15: Agentless monitoring with icinga and prometheus

More Related Content

PPTX
Introduction to docker and oci
PDF
쉽고 빠르게 접하는 오픈스택
PPTX
Drone tutorial
PDF
Infrastructure & System Monitoring using Prometheus
PDF
Prometheus and Docker (Docker Galway, November 2015)
PPT
Monitoring using Prometheus and Grafana
PDF
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
PDF
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Introduction to docker and oci
쉽고 빠르게 접하는 오픈스택
Drone tutorial
Infrastructure & System Monitoring using Prometheus
Prometheus and Docker (Docker Galway, November 2015)
Monitoring using Prometheus and Grafana
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring in Big Data Platform - Albert Lewandowski, GetInData

Similar to DevOps Braga #15: Agentless monitoring with icinga and prometheus (20)

PPTX
Monitoring kubernetes with prometheus-operator
PDF
Monitoring with prometheus at scale
PDF
Monitoring with prometheus at scale
PDF
Docker Monitoring Webinar
PDF
Time series denver an introduction to prometheus
ODP
Zero Downtime JEE Architectures
PDF
System monitoring
PPTX
Prometheus workshop
PDF
Cloud Monitoring tool Grafana
PDF
Prometheus (Microsoft, 2016)
PDF
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
PDF
Introduction to PaaS and Heroku
ODP
Monitoring With Prometheus
PDF
Integrating ChatGPT with Apache Airflow
PDF
Prometheus
PDF
202107 - Orion introduction - COSCUP
PPTX
Orchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
PDF
Airflow Intro-1.pdf
PDF
Globus Compute Introduction - GlobusWorld 2024
PDF
Sprint 17
Monitoring kubernetes with prometheus-operator
Monitoring with prometheus at scale
Monitoring with prometheus at scale
Docker Monitoring Webinar
Time series denver an introduction to prometheus
Zero Downtime JEE Architectures
System monitoring
Prometheus workshop
Cloud Monitoring tool Grafana
Prometheus (Microsoft, 2016)
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
Introduction to PaaS and Heroku
Monitoring With Prometheus
Integrating ChatGPT with Apache Airflow
Prometheus
202107 - Orion introduction - COSCUP
Orchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Airflow Intro-1.pdf
Globus Compute Introduction - GlobusWorld 2024
Sprint 17
Ad

More from DevOps Braga (8)

PPTX
Infrastructural challenges of a fast-pace startup
PDF
DevOps Braga #11: Docker Anatomy
PDF
DevOps Braga #9: Introdução ao Terraform
PPTX
DevOps Braga #4: Infrastructure as Code: Impulsionar DevOps
PPTX
DevOps Braga #7: Salt: Configuration Management
PDF
DevOps Braga #3: Admin rights, everyone gets Admin rights!
PDF
DevOps Braga #6
PDF
DevOps Braga #5
Infrastructural challenges of a fast-pace startup
DevOps Braga #11: Docker Anatomy
DevOps Braga #9: Introdução ao Terraform
DevOps Braga #4: Infrastructure as Code: Impulsionar DevOps
DevOps Braga #7: Salt: Configuration Management
DevOps Braga #3: Admin rights, everyone gets Admin rights!
DevOps Braga #6
DevOps Braga #5
Ad

Recently uploaded (20)

PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Machine learning based COVID-19 study performance prediction
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
cuic standard and advanced reporting.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Empathic Computing: Creating Shared Understanding
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Machine learning based COVID-19 study performance prediction
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
cuic standard and advanced reporting.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
20250228 LYD VKU AI Blended-Learning.pptx
A Presentation on Artificial Intelligence
Per capita expenditure prediction using model stacking based on satellite ima...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Empathic Computing: Creating Shared Understanding
Unlocking AI with Model Context Protocol (MCP)
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
The AUB Centre for AI in Media Proposal.docx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Spectral efficient network and resource selection model in 5G networks

DevOps Braga #15: Agentless monitoring with icinga and prometheus

  • 2. Thank you Hosted by Sponsored by
  • 3. Today ● Agentless monitoring with Icinga and Prometheus by Diogo Machado ● Coffee break and networking
  • 4. Agentless monitoring with Icinga and Prometheus Diogo Machado dgm@eurotux.com 04/11/2019 DevOps Braga #15
  • 5. Agenda ● From Icinga to Prometheus ● Prometheus Basic Concepts ● Prometheus Server Configuration ● Getting data into Prometheus ● Implement custom metrics ● How to integrate Icinga with Prometheus?
  • 6. From Icinga to Prometheus - Introduction ● Open-source computer system and network monitoring application; ● Monitor the availability of hosts and services; ● Distributed Monitoring; ● Agent Based Monitoring; ● Notifications and Downtimes; ● Written in PHP and C++; ● Open-source systems monitoring and alerting toolkit; ● Monitoring of highly service-oriented architectures; ● Collect metrics with Exporters; ● Store aggregated metrics; ● Written in Go;
  • 7. From Icinga to Prometheus - Comparison ● Alerting based on the exit codes of Checks; ● Host-based; ● There is no notion of labels or query language. ● No storage per-se, beyond the check state; ● All configuration are made via file; ● Monitoring of small and/or static systems where blackbox probing is sufficient. ● Multi-dimensional data model with time series data; ● Rule-based alerts; ● Prometheus Query Language - PromQL; ● Centralized data store; ● Suitable for dynamic or cloud based environment - Whitebox monitoring;
  • 8. From Icinga to Prometheus - Comparison If there are both to system monitoring…. Why not choose only one? What does Prometheus have that Icinga doesn't have? Why should we combine both?
  • 9. From Icinga to Prometheus - Conclusion So … Why should we combine both? ● Prometheus have: ○ Exporters to third-party systems and applications; ○ Centralized control and HTTP API; ● Icinga have: ○ Easy configuration of host and services (Icinga Director); ○ Good alerting system with notifications and schedule downtimes; Combine to get the best of each one: ● Scrape metrics with Prometheus ● Configure and Alert with Icinga
  • 10. Prometheus's main features are: ● multi-dimensional data model with time series data identified by metric name and key/value pairs; ● PromQL, a flexible query language to leverage this dimensionality; ● no reliance on distributed storage - single server nodes are autonomous; ● time series collection happens via a pull model over HTTP; ● pushing time series is supported via an intermediary gateway; ● targets are discovered via service discovery or static configuration; ● support for graph and dashboard mode. Prometheus Basic Concepts
  • 11. Prometheus Basic Concepts - Architecture The Prometheus ecosystem consists of multiple components: ● the main Prometheus server which scrapes and stores time series data; ● a Pushgateway for supporting short-lived jobs; ● specific exporters for services like HAProxy, StatsD, Graphite, etc; ● an Alertmanager to handle alerts; ● data visualization tools like Grafana;
  • 12. Prometheus Basic Concepts - Data Model ● Data is stored as time series: streams of timestamped values belonging to the same metric and the same set of labeled dimensions; ● Key Value Data Model: ○ Key - Metric name and a set of labels; ○ Value - Metric measuring;
  • 13. Prometheus Basic Concepts - Metric Types Counter ● Cumulative metric; ● Monotonically increasing counter; ● Examples: ○ Nº requests served; ○ Nº tasks completed; ○ Number of errors; Gauge ● Numerical value that can arbitrarily go up or down; ● Examples: ○ Temperature; ○ Memory usage; ○ Nº concurrent requests; Histogram ● Values are aggregated in buckets; ● Expose total sum of all observed values; ● Count of events that have been observed; ● Example: ○ Request latency; Summary ● Similar to a histogram; ● Calculates configurable quantiles over a sliding time window; ● Examples: ○ Request durations; ○ Response sizes; requests_time_seconds_bucket{app=”projectx”,le=”0.005"} 2343340162 … (buckets) requests_time_seconds_sum{app=”projectx”} 5.366133242442994e+07 requests_time_seconds_count{app=”projectx”} 3973861256 go_gc_duration_seconds{quantile="0"} 4.274e-05 … (quantiles) go_gc_duration_seconds_sum 0.467543895 go_gc_duration_seconds_count 92 Histogram Summary
  • 14. Prometheus Basic Concepts - Jobs and Instances ● An endpoint you can scrape is called an Instance; ● A collection of instances with the same purpose is called a Job; ● Prometheus scrapes a Target and attaches labels automatically: job name, instance host and port; ● Example of job with 3 instances: job: 1: instance 1: 128.0.0.1:9030 instance 2: 128.0.0.2:9030 instance 3: 128.0.0.3:9030
  • 15. Prometheus Basic Concepts - PromQL Expression language data types: ● Instant vector - a set of time series containing a single sample for each time series; ○ Example: http_requests_total{environment=~"staging|development",method!="GET"} ● Range vector - a set of time series containing a range of data points for each time series; ○ Example: http_requests_total{job="prometheus"}[5m] ● Scalar - a simple numeric floating point value; ○ Example: -2.43 ● String - a simple string value; ○ Example: 'these are unescaped: n t'
  • 16. Prometheus Basic Concepts - REST API ● Response format is JSON: ● Methods: GET, POST ● Endpoints: ○ /api/v1/query ○ /api/v1/query_range ○ /api/v1/series ○ /api/v1/labels ○ /api/v1/targets ○ /api/v1/rules ○ /api/v1/targets/metadata ○ /api/v1/status/config ○ /api/v1/status/flags
  • 17. Prometheus Basic Concepts - Scrape Metrics ● Prometheus works essentially pulling metrics metadata from targets; ● Pulling over HTTP offers a number of advantages: ○ Easily tell if a target is down; ○ Manually inspect target health with a web browser. ● However, push model can be implemented with Pushgateway: ○ Intermediary service which allows to push metrics from jobs that cannot be scraped; ○ Disadvantages: ■ Pushgateway becomes both a single point of failure (SPOF) and a potential bottleneck; ■ Lose Prometheus's automatic instance health monitoring via the UP metric.
  • 18. Prometheus Server - Installation ● Using pre-compiled binaries; ● From source (Makefile); ● Using Docker (Quay.io or Docker Hub) ● Using configuration management systems: ○ Ansible ○ Chef ○ Puppet Docker command: docker run -p 9090:9090 -v /prometheus-data prom/prometheus --config.file=prometheus.yml Dockerfile: FROM prom/prometheus ADD prometheus.yml /etc/prometheus/ Build: docker build -t my-prometheus . docker run -p 9090:9090 my-prometheus
  • 19. Prometheus Server - Configuration ● Prometheus is configured via command-line flags and a configuration file: prometheus.yml; ● Prometheus default port is 9090; ● Prometheus can reload its configuration at runtime: ○ SIGHUP to the Prometheus process; ○ HTTP POST request to the “/-/reload” endpoint (when the --web.enable-lifecycle flag is enabled). ● Recording rules and Alerting rules should be written in a YAML file (rule_files); ● It’s possible to use service discovery mechanism to automatically update scrape target list; global: scrape_interval: 15s scrape_timeout: 10s evaluation_interval: 15s external_labels: environment: tst rule_files: - /opt/prometheus/rules/*.rules scrape_configs: - job_name: node scrape_interval: 30s metrics_path: /metrics scheme: http ec2_sd_configs: - endpoint: "" region: eu-west-3 refresh_interval: 1m port: 9100 alerting: alertmanagers: - static_configs: - targets: - localhost:9093 scheme: http timeout: 10s api_version: v1
  • 21. Prometheus Server Targets and Rules Targets and Rules defined on Prometheus configuration
  • 22. Getting data into Prometheus ● Data can be export to Prometheus using Pushgateway or via Exporter; ● Prometheus have multiple Exporters, of which stand out: ○ Node/system metrics exporter; ○ MySQL server exporter; ○ Memcached exporter; ○ Kafka exporter; ○ HAProxy exporter; ○ Tomcat exporter; ○ AWS CloudWatch exporter; ● Each Exporter has a default port allocated and a route (‘/metrics’’); ● Besides official and community Exporters, it’s possible to write custom Exporters; ● Most exporters can be run as a service or using Docker;
  • 23. ● Pushgateway isn’t an event store but an intermediary service to push metrics from jobs that can’t be scraped; ● Deploy using binary or Docker image: ○ Example: docker run -d -p 9091:9091 prom/pushgateway ● To change the listen address, use the “--web.listen-address” flag; ● By default, Pushgateway doesn’t persist metrics. To cache them, use “--persistence.file” option; ● All pushes are done via HTTP. The interface is REST-like. Metrics are available on ‘/metrics’ route ; ● Examples: ○ echo "some_metric 3.14" | curl --data-binary @- http://localhost:9091/metrics/job/job1 ○ curl -X DELETE http://localhost:9091/metrics/job/job1 Getting data into Prometheus - Pushgateway
  • 24. Getting data into Prometheus - Node Exporter ● Expose metrics of Hardware and Operating System based on collectors, usually running on port 9100; ● By default, there is a specific set of collectors for each operating system: cpu, diskstats, filesystem, netstat, nfs, textfile, … ● Others can be enabled with collector option: ntp, processes, systemd, … ○ Example: --collector.processes ● Exporter can be run using third-party repository for RHEL/CentOS/Fedora (Corp), using the source code or using Docker; ○ Example: docker run -d --net=”host” --pid=”host” -v “/:/host:ro” prom/node-exporter --path.rootfs=/host --collector.ntp --collector.processes --collector.textfile.directory /var/lib/node_exporter/textfile_collector --cap-add=SYS_TIME
  • 25. Implement custom metrics on Node Exporter ● Custom metrics can be implemented with textfile collector; ● Textfile collector is similar to Pushgateway, in that it allows exporting of statistics from batch jobs; ● Pushgateway should be used for service-level metrics, while textfile should be used for machine metrics; ● The collector will parse all files in textfile directory matching the glob *.prom using the text format; ● Example to automatically push logged in users on machine: echo node_users_logged_in $(who /host/var/run/utmp | wc -l) > /var/lib/node_exporter/textfile_collector/users.prom.$$ mv /var/lib/node_exporter/textfile_collector/users.prom.$$ /var/lib/node_exporter/textfile_collector/users.prom
  • 26. Getting data into Prometheus - Node Exporter Example of metrics available at endpoint “/metrics”
  • 27. Prometheus query Node Exporter Data Example of a query in PromQL to check CPU usage
  • 28. How to integrate Icinga with Prometheus? ● Icinga can be integrated with Prometheus via Nagitheus (Claranet); ● Nagitheus is a Nagios plugin for querying Prometheus, written in Go; ● Nagitheus process vector or scalar results and return an exit code, according with warning/critical values and comparison method (ge, gt, le, lt); ● Allows basic authentication on Prometheus with username and password (-u and -pw options); ● Example: /usr/lib64/nagios/plugins/nagitheus -H http://localhost:9090 -i 10.0.0.10 -p 9100 -l 'Check CPU' -d yes -q '(avg by (mode) (irate(node_cpu_seconds_total{instance="", mode!="idle"}[5m])) * 100)' -m ge -w 70 -c 80
  • 29. How to integrate Icinga with Prometheus? ● Basic Linux Checks: ○ CPU ○ Disk ○ Load ○ Memory ○ Total procs ○ Ntp ● Steps to implement a check: ○ Identify metrics to use on query; ○ Create query considering PromQL operators and metric types; ○ Test query on Promehteus and define label, method and critical/warning values; ○ Implement icinga command using the query and values specified for each option.
  • 30. How to integrate Icinga with Prometheus? ● Check Disk (Percentage of disk free): 1. Metrics to use on check: ○ node_filesystem_free_bytes ○ node_filesystem_size_bytes 2. Data types: Instant vectores; 3. PromQL operators: ○ / (division) ○ * (multiplication) 4. Query: (node_filesystem_free_bytes / node_filesystem_size_bytes{fstype!~"none|tmpfs|sysfs", mountpoint=~"/var/.*", instance=”172.27.68.163:9030”} )* 100 5. Analyze query result and define method, critical and warning value # HELP node_filesystem_free_bytes Filesystem free space in bytes. # TYPE node_filesystem_free_bytes gauge node_filesystem_free_bytes{device="shm",fstype="ext4",mountpoint="/var"} 1.01808578e+10 … # HELP node_filesystem_size_bytes Filesystem size in bytes. # TYPE node_filesystem_size_bytes gauge node_filesystem_size_bytes{device="shm",fstype="ext4",mountpoint="/var"} 1.05017712e+10 …
  • 31. How to integrate Icinga with Prometheus? ● Query Result Method: le Warning: 20% Critical:10% ● Icinga Command: /usr/lib64/nagios/plugins/nagitheus -H http://localhost:9090 -i 172.27.68.163 -p 9030 -l 'Check disk' -d yes -m le -w 20 -c 10 -q '(node_filesystem_free_bytes / node_filesystem_size_bytes{fstype!~"none|tmpfs|sysfs", mountpoint=~"/var/.*"} )* 100' Warning (20) Critical (10)