SlideShare a Scribd company logo
Prometheus on Kubernetes
Christoph Petrausch 29.4.2019
● Golang
● Kubernetes
Christoph Petrausch
Linux Systems Engineer
2
3
Umgebung
https://www.flickr.com/photos/noaaphotolib/5578039998/
4
Volatile Infrastruktur
https://www.flickr.com/photos/burgtender/4052169876/
5
Service Discovery
Pods, Nodes, Services A
P
I
targets.yml
A
P
I
VMs
DNS
SRV, A, AAAA
6
Simple
https://www.flickr.com/photos/nicokaiser/4667377944
7
Metriken sammeln und speichern
tsdb
Target
GET /metrics
8
Hochverfügbarkeit
https://www.flickr.com/photos/rob-sinclair/2553517053
9
Doppelt hält besser
Target
Target
Target
.
.
.
10
Skalierbar
https://www.flickr.com/photos/rob-sinclair/2553517053
11
Sharding
Prometheus
Target
Target
Target
.
.
.
Target
12
Services
https://www.flickr.com/photos/bigshock/363611248
13
14
Instrumentation
https://www.flickr.com/photos/nasa_ice/15163970050
15
Exporter
Operating System
Kubelet, Kube API, Kube
Scheduler, etc..
Application
Database
mysql-exporter
node-exporter
16
Metrik Format
http_count
{
}
731321
Name
Wert
17
Metrik Format
# HELP Total Number of HTTP Requests
# TYPE http_count counter
http_count
{
}
731321
Dokumentation
18
Metrik Format
# HELP Total Number of HTTP Requests
# TYPE http_count counter
http_count
{
handler="/ui/static",
instance="website-jas1kg1d-adjkm1",
job="pods",
service="website"
}
731321
Labels
19
Metrik Format
# HELP node_disk_discard_time_seconds_total This is the total number of
seconds spent by all discards.
# TYPE node_disk_discard_time_seconds_total counter
node_disk_discard_time_seconds_total{device="dm-0"} 0
node_disk_discard_time_seconds_total{device="dm-1"} 0
node_disk_discard_time_seconds_total{device="nvme0n1"} 0
node_disk_discard_time_seconds_total{device="sda"} 0
# HELP node_disk_discarded_sectors_total The total number of sectors
discarded successfully.
# TYPE node_disk_discarded_sectors_total counter
node_disk_discarded_sectors_total{device="dm-0"} 0
node_disk_discarded_sectors_total{device="dm-1"} 0
node_disk_discarded_sectors_total{device="nvme0n1"} 0
node_disk_discarded_sectors_total{device="sda"} 0
# HELP node_cpu_seconds_total Seconds the cpus spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 100327.11
node_cpu_seconds_total{cpu="0",mode="iowait"} 167.2
node_cpu_seconds_total{cpu="0",mode="irq"} 1211.28
node_cpu_seconds_total{cpu="0",mode="nice"} 5762.09
Counter
20
Zeit
Gauge
21
Zeit
SLAs
● 99,9% aller Requests kürzer als 50ms
● 99,9% Verfügbarkeit
● 99,99% aller Request müssen erfolgreich sein
22
Long Tail
23
Long Tail
24
Average 99,9%
PromQL
● Angelehnt an SQL
● Aggregation über Labels
● Mathematische Funktionen
● Range und Offset Selektoren
25
PromQL
Aggregationen
Query:
sum by (service)(http_count{handler="/ui/static"} )
Beispiel:
http_count{handler="/ui/static",service="a",i="a"} 5
http_count{handler="/ui/static",service="a",i="y"} 5
http_count{handler="/ui/static",service="b",i="z"} 5
Resultat:
{service="a"} =10, {service="b"} = 5
26
PromQL
Funktionen
Request Rate pro Instanz:
rate(
http_count{handler="/ui/static"}[1m]
)
Aggregiert für jeden Service:
sum by (service) (
rate(
http_count{handler="/ui/static"}[1m]))
27
Histogram
28
PromQL
Offset
Differenz jetzt vs gestern:
http_count – ( http_count offset 24h)
29
PromQL
Alert Rules
groups:
- name: website
rules:
- alert: HighErrorRate
expr: sum by (service)(rate(http_error{handler="/ui/static"}[1m])) > 0.5
for: 10m
labels:
severity: page
annotations:
summary: High request latency
30
PromQL
Anwendung
● Grafana Dashboards
● Alert Rules
● Ad Hoc Queries
○ UI
○ API
31
Prometheus Ökosystem
Grafana
32
Prometheus Ökosystem
Alertmanager
33
● Alert Routing
● Silences
● Inhibition
● Grouping
34
1000 feet Architektur
Pods, Nodes, Services A
P
I
Target
Target
Target
Target
...
Grafana
Alertmanager
Prometheus
Offene Probleme
35
● Langzeitspeicherung
● Aggregierung über mehrere Prometheus
● Deduplizierung bei Sharding
36
Thanos
T
T
T
TSidecar
Sidecar
T
T
T
TSidecar
Sidecar
Blob Storage
Alertmanager
Thanos Query
Thanos Ruler
Thanos Compact
Grafana
Vielen Dank
Christoph Petrausch
Twitter:
@hikhvar
Mail:
christoph.petrausch@ino
vex.de
Quellen
38

More Related Content

PDF
Magie di git
PDF
Patroni: Kubernetes-native PostgreSQL companion
PDF
Git Going at JavaZone 2010
PPTX
K8s Basic
PDF
Tracing python applications
PDF
Docker at Digital Ocean
PPTX
Kubernetes Navigation Stories – DevOpsStage 2019, Kyiv
PDF
OpenStack - A Python-based Cloud-Software
Magie di git
Patroni: Kubernetes-native PostgreSQL companion
Git Going at JavaZone 2010
K8s Basic
Tracing python applications
Docker at Digital Ocean
Kubernetes Navigation Stories – DevOpsStage 2019, Kyiv
OpenStack - A Python-based Cloud-Software

What's hot (20)

PDF
The overview of lazypull with containerd Remote Snapshotter & Stargz Snapshotter
PPTX
Kubernetes and OpenStack at Scale
PDF
The myths of deprecating docker in kubernetes
PDF
Git submodule
PDF
Containers for the Enterprise: Delivering OpenShift on OpenStack for Performa...
PDF
Angular v2 et plus : le futur du développement d'applications en entreprise
PDF
Starting up Containers Super Fast With Lazy Pulling of Images
PDF
Docker: ao vivo e a cores
PDF
Docker to the Rescue of an Ops Team
PDF
Introducing docker
PDF
BuildKitでLazy Pullを有効にしてビルドを早くする話
PDF
OpenCms Days 2012 - Developing OpenCms with Gradle
PDF
Distributed tracing for Node.js
PDF
Libcontainer: joining forces under one roof
PPTX
Introduzione a GitHub Actions (beta)
PDF
Red Hat Summit 2017: Wicked Fast PaaS: Performance Tuning of OpenShift and D...
PDF
Git in 5 Minutes
PDF
Shifter singularity - june 7, 2018 - bw symposium
PDF
Integrating Puppet and Gitolite for sysadmins cooperations
PDF
Monitoring the Hashistack with Prometheus
The overview of lazypull with containerd Remote Snapshotter & Stargz Snapshotter
Kubernetes and OpenStack at Scale
The myths of deprecating docker in kubernetes
Git submodule
Containers for the Enterprise: Delivering OpenShift on OpenStack for Performa...
Angular v2 et plus : le futur du développement d'applications en entreprise
Starting up Containers Super Fast With Lazy Pulling of Images
Docker: ao vivo e a cores
Docker to the Rescue of an Ops Team
Introducing docker
BuildKitでLazy Pullを有効にしてビルドを早くする話
OpenCms Days 2012 - Developing OpenCms with Gradle
Distributed tracing for Node.js
Libcontainer: joining forces under one roof
Introduzione a GitHub Actions (beta)
Red Hat Summit 2017: Wicked Fast PaaS: Performance Tuning of OpenShift and D...
Git in 5 Minutes
Shifter singularity - june 7, 2018 - bw symposium
Integrating Puppet and Gitolite for sysadmins cooperations
Monitoring the Hashistack with Prometheus
Ad

Similar to Prometheus on Kubernetes (9)

PDF
Monitoring Kubernetes with Prometheus
PDF
Monitoring a Kubernetes-backed microservice architecture with Prometheus
PDF
Prometheus as a Service
PDF
Monitoring on Kubernetes using Prometheus - Chandresh
PPTX
Monitoring on Kubernetes using prometheus
PDF
The hitchhiker’s guide to Prometheus
PDF
The hitchhiker’s guide to Prometheus
PDF
Prometheus monitoring
PDF
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
Monitoring Kubernetes with Prometheus
Monitoring a Kubernetes-backed microservice architecture with Prometheus
Prometheus as a Service
Monitoring on Kubernetes using Prometheus - Chandresh
Monitoring on Kubernetes using prometheus
The hitchhiker’s guide to Prometheus
The hitchhiker’s guide to Prometheus
Prometheus monitoring
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
Ad

More from inovex GmbH (20)

PDF
lldb – Debugger auf Abwegen
PDF
Are you sure about that?! Uncertainty Quantification in AI
PDF
Why natural language is next step in the AI evolution
PDF
WWDC 2019 Recap
PDF
Network Policies
PDF
Interpretable Machine Learning
PDF
Jenkins X – CI/CD in wolkigen Umgebungen
PDF
AI auf Edge-Geraeten
PDF
Deep Learning for Recommender Systems
PDF
Azure IoT Edge
PDF
Representation Learning von Zeitreihen
PDF
Talk to me – Chatbots und digitale Assistenten
PDF
Künstlich intelligent?
PDF
Dev + Ops = Go
PDF
Das Android Open Source Project
PDF
Machine Learning Interpretability
PDF
Performance evaluation of GANs in a semisupervised OCR use case
PDF
People & Products – Lessons learned from the daily IT madness
PDF
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
PDF
Remote First – Der Arbeitsplatz in der Cloud
lldb – Debugger auf Abwegen
Are you sure about that?! Uncertainty Quantification in AI
Why natural language is next step in the AI evolution
WWDC 2019 Recap
Network Policies
Interpretable Machine Learning
Jenkins X – CI/CD in wolkigen Umgebungen
AI auf Edge-Geraeten
Deep Learning for Recommender Systems
Azure IoT Edge
Representation Learning von Zeitreihen
Talk to me – Chatbots und digitale Assistenten
Künstlich intelligent?
Dev + Ops = Go
Das Android Open Source Project
Machine Learning Interpretability
Performance evaluation of GANs in a semisupervised OCR use case
People & Products – Lessons learned from the daily IT madness
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
Remote First – Der Arbeitsplatz in der Cloud

Recently uploaded (20)

PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
medical staffing services at VALiNTRY
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
ai tools demonstartion for schools and inter college
PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Understanding Forklifts - TECH EHS Solution
PPT
Introduction Database Management System for Course Database
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
Nekopoi APK 2025 free lastest update
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
Online Work Permit System for Fast Permit Processing
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
Odoo Companies in India – Driving Business Transformation.pdf
medical staffing services at VALiNTRY
Navsoft: AI-Powered Business Solutions & Custom Software Development
How Creative Agencies Leverage Project Management Software.pdf
ai tools demonstartion for schools and inter college
Softaken Excel to vCard Converter Software.pdf
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Understanding Forklifts - TECH EHS Solution
Introduction Database Management System for Course Database
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
Nekopoi APK 2025 free lastest update
PTS Company Brochure 2025 (1).pdf.......
Online Work Permit System for Fast Permit Processing
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Design an Analysis of Algorithms I-SECS-1021-03
CHAPTER 2 - PM Management and IT Context
How to Choose the Right IT Partner for Your Business in Malaysia

Prometheus on Kubernetes