SlideShare a Scribd company logo
Brian Brazil
Founder
Prometheus for
Monitoring Metrics
Who am I?
● One of the main developers of Prometheus
● Founder of Robust Perception
● Contributor to many open source projects
Why monitor?
● Know when things go wrong
β—‹ To call in a human to prevent a business-level issue
● Be able to debug and gain insight
● Trending to see changes over time, and drive
technical/business decisions
● To feed into other systems/processes
Services have Internals
Monitor the Internals
Monitor as a Service, not as Machines
What is Prometheus?
Metrics monitoring system (not logs).
A time series database. A query language.
Client libraries. An Ecosystem.
A modern approach to monitoring services.
Architecture
Client Libraries
Instrument your code to capture the metrics that
matter to you.
If upstream libraries are instrumented, you get that
for free!
Also many exporters, cAdvisor, MySQL, MongoDB,
SNMP, JMX, HAProxy, Minecraft, Factorio...
Let’s Talk Code
pip install prometheus_client
from prometheus_client import Summary, start_http_server
REQUEST_DURATION = Summary('request_duration_seconds',
'Request duration in seconds')
@REQUEST_DURATION.time()
def my_handler(request):
pass // Your code here
start_http_server(8000)
Multiple Dimensions
from prometheus_client import Counter
REQUESTS = Counter('requests_total',
'Total requests', ['method'])
def my_handler(request):
REQUESTS.labels(request.method).inc()
pass // Your code here
Exceptional Circumstances In Progress
from prometheus_client import Counter, Gauge
EXCEPTIONS = Counter('exceptions_total', 'Total exceptions')
IN_PROGRESS = Gauge('inprogress_requests', 'In progress')
@EXCEPTIONS.count_exceptions()
@IN_PROGRESS.track_inprogress()
def my_handler(request):
pass // Your code here
Getting Data Out
from prometheus_client import start_http_server
if __name__ == '__main__':
start_http_server(8080)
Also possible with Django, Twisted etc.
The PromQL Query Language
Arbitrary aggregation, joins and slicing all possible.
Can calculate how close you'll be to your quota in 4
hours, or the 95th percentile latency across an entire
datacenter.
If you can graph it, you can alert on it!
Analytics: Top 5 Docker images by CPU
topk(5,
sum by (image)(
rate(container_cpu_usage_seconds_total{
id=~"/system.slice/docker.*"}[5m]
)
)
)
Heterogeneity
Not all VMs are equal.
Noisy neighbours mean different application
instance have different performance.
But PromQL can aggregate latency across
instances, allowing you to alert on overall end-user
visible latency rather than outliers.
Alert management
Not every alert results in a page.
Group similar alerts together, route them to the right
team and throttle notifications.
Designed to work reliably during network partitions.
Reliability is Key
Core Prometheus server is a single binary.
Each Prometheus server is independent.
No clustering or attempts to backfill "missing" data
when scrapes fail.
Option for remote storage for long term storage.
Monitoring Approach
Service management went from manual to Chef to
Kubernetes. Need to do the same for monitoring.
Care about what matters to end users, such as
latency and error rates.
Distracting a human with alerts for everything that's
vaguely off only leads to burnout.
A Rich Community
Today there are 750+ contributors to the core
repositories, and 350+ 3rd party integrations.
There are 1000+ subscribers on our mailing lists,
600+ people in IRC and an estimated 10000+
companies using Prometheus in production.
Many companies funding Prometheus development.
Live Demo!
What is Prometheus?
Metrics monitoring system (not logs).
A time series database. A query language.
Client libraries. An Ecosystem.
A modern approach to monitoring services.
Prometheus: The Book
Coming in 2018!
Resources
Official Project Website: prometheus.io
User Mailing List: prometheus-users@googlegroups.com
Dev Mailing List: prometheus-developers@googlegroups.com
IRC: #prometheus on chat.freenode.net
Robust Perception Blog: www.robustperception.io/blog

More Related Content

PDF
Prometheus Overview
PDF
Monitoring Kubernetes with Prometheus
PDF
Monitoring with prometheus
PPTX
Prometheus design and philosophy
PDF
Infrastructure & System Monitoring using Prometheus
PDF
How to monitor your micro-service with Prometheus?
PPTX
Monitoring With Prometheus
PDF
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Prometheus Overview
Monitoring Kubernetes with Prometheus
Monitoring with prometheus
Prometheus design and philosophy
Infrastructure & System Monitoring using Prometheus
How to monitor your micro-service with Prometheus?
Monitoring With Prometheus
Systems Monitoring with Prometheus (Devops Ireland April 2015)

What's hot (20)

PDF
Intro to open source observability with grafana, prometheus, loki, and tempo(...
PPTX
Docker: From Zero to Hero
PDF
Monitoring with Prometheus
PPTX
An Introduction to Prometheus (GrafanaCon 2016)
PDF
Getting Started Monitoring with Prometheus and Grafana
PDF
End to-end monitoring with the prometheus operator - Max Inden
PDF
Real Time Test Data with Grafana
PDF
Zap Scanning
ODP
Monitoring With Prometheus
PPT
Monitoring using Prometheus and Grafana
PPTX
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
PDF
Monitoring microservices with Prometheus
PPTX
Prometheus workshop
PDF
Prometheus monitoring
PDF
Building an Observability platform with ClickHouse
ODP
Kubernetes Architecture
PDF
Cloud-Native Observability
PDF
Introduction to docker
PDF
[μ˜€ν”ˆμ†ŒμŠ€μ»¨μ„€νŒ…] μΏ λ²„λ„€ν‹°μŠ€μ™€ μΏ λ²„λ„€ν‹°μŠ€ on μ˜€ν”ˆμŠ€νƒ 비ꡐ 및 ꡬ좕 방법
PDF
Observabilidad de sistemas
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Docker: From Zero to Hero
Monitoring with Prometheus
An Introduction to Prometheus (GrafanaCon 2016)
Getting Started Monitoring with Prometheus and Grafana
End to-end monitoring with the prometheus operator - Max Inden
Real Time Test Data with Grafana
Zap Scanning
Monitoring With Prometheus
Monitoring using Prometheus and Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Monitoring microservices with Prometheus
Prometheus workshop
Prometheus monitoring
Building an Observability platform with ClickHouse
Kubernetes Architecture
Cloud-Native Observability
Introduction to docker
[μ˜€ν”ˆμ†ŒμŠ€μ»¨μ„€νŒ…] μΏ λ²„λ„€ν‹°μŠ€μ™€ μΏ λ²„λ„€ν‹°μŠ€ on μ˜€ν”ˆμŠ€νƒ 비ꡐ 및 ꡬ좕 방법
Observabilidad de sistemas
Ad

Similar to Prometheus for Monitoring Metrics (Fermilab 2018) (20)

PPTX
Prometheus for Monitoring Metrics (Percona Live Europe 2017)
PPTX
Prometheus (Monitorama 2016)
PDF
Microservices and Prometheus (Microservices NYC 2016)
PDF
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
PPTX
Prometheus (Prometheus London, 2016)
PDF
Prometheus and Docker (Docker Galway, November 2015)
PPTX
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
PDF
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
PPTX
Prometheus - Open Source Forum Japan
PDF
Prometheus (Microsoft, 2016)
PDF
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
PDF
Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...
PPTX
Prometheus: From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)
PDF
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
PDF
Prometheus Course from beginners to expert course
PPTX
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)
PDF
DevOps Braga #15: Agentless monitoring with icinga and prometheus
PPTX
Prometheus and Grafana
PDF
Prometheus course
PPTX
Evolution of Monitoring and Prometheus (Dublin 2018)
Prometheus for Monitoring Metrics (Percona Live Europe 2017)
Prometheus (Monitorama 2016)
Microservices and Prometheus (Microservices NYC 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Prometheus (Prometheus London, 2016)
Prometheus and Docker (Docker Galway, November 2015)
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Prometheus - Open Source Forum Japan
Prometheus (Microsoft, 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...
Prometheus: From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
Prometheus Course from beginners to expert course
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)
DevOps Braga #15: Agentless monitoring with icinga and prometheus
Prometheus and Grafana
Prometheus course
Evolution of Monitoring and Prometheus (Dublin 2018)
Ad

More from Brian Brazil (13)

PPTX
OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)
PPTX
Evaluating Prometheus Knowledge in Interviews (PromCon 2018)
PPTX
Anatomy of a Prometheus Client Library (PromCon 2018)
PPTX
Evolution of the Prometheus TSDB (Percona Live Europe 2017)
PPTX
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)
PPTX
Rule 110 for Prometheus (PromCon 2017)
PPTX
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)
PPTX
What does "monitoring" mean? (FOSDEM 2017)
PPTX
Provisioning and Capacity Planning (Travel Meets Big Data)
PPTX
So You Want to Write an Exporter
PPTX
An Exploration of the Formal Properties of PromQL
PPTX
Life of a Label (PromCon2016, Berlin)
PDF
Ansible at FOSDEM (Ansible Dublin, 2016)
OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)
Evaluating Prometheus Knowledge in Interviews (PromCon 2018)
Anatomy of a Prometheus Client Library (PromCon 2018)
Evolution of the Prometheus TSDB (Percona Live Europe 2017)
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)
Rule 110 for Prometheus (PromCon 2017)
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)
What does "monitoring" mean? (FOSDEM 2017)
Provisioning and Capacity Planning (Travel Meets Big Data)
So You Want to Write an Exporter
An Exploration of the Formal Properties of PromQL
Life of a Label (PromCon2016, Berlin)
Ansible at FOSDEM (Ansible Dublin, 2016)

Recently uploaded (20)

PDF
πŸ’° π”πŠπ“πˆ πŠπ„πŒπ„ππ€ππ†π€π πŠπˆππ„π‘πŸ’πƒ π‡π€π‘πˆ 𝐈𝐍𝐈 πŸπŸŽπŸπŸ“ πŸ’°
Β 
PDF
Unit-1 introduction to cyber security discuss about how to secure a system
PPTX
Digital Literacy And Online Safety on internet
DOCX
Unit-3 cyber security network security of internet system
PPT
Design_with_Watersergyerge45hrbgre4top (1).ppt
PPTX
Funds Management Learning Material for Beg
PPTX
Job_Card_System_Styled_lorem_ipsum_.pptx
PDF
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
Β 
PDF
SASE Traffic Flow - ZTNA Connector-1.pdf
PPTX
Module 1 - Cyber Law and Ethics 101.pptx
PDF
Triggering QUIC, presented by Geoff Huston at IETF 123
Β 
PDF
An introduction to the IFRS (ISSB) Stndards.pdf
PDF
Cloud-Scale Log Monitoring _ Datadog.pdf
PPTX
522797556-Unit-2-Temperature-measurement-1-1.pptx
PPTX
international classification of diseases ICD-10 review PPT.pptx
PPTX
Introduction to Information and Communication Technology
PDF
RPKI Status Update, presented by Makito Lay at IDNOG 10
Β 
PPTX
Slides PPTX World Game (s) Eco Economic Epochs.pptx
PDF
Testing WebRTC applications at scale.pdf
PDF
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
πŸ’° π”πŠπ“πˆ πŠπ„πŒπ„ππ€ππ†π€π πŠπˆππ„π‘πŸ’πƒ π‡π€π‘πˆ 𝐈𝐍𝐈 πŸπŸŽπŸπŸ“ πŸ’°
Β 
Unit-1 introduction to cyber security discuss about how to secure a system
Digital Literacy And Online Safety on internet
Unit-3 cyber security network security of internet system
Design_with_Watersergyerge45hrbgre4top (1).ppt
Funds Management Learning Material for Beg
Job_Card_System_Styled_lorem_ipsum_.pptx
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
Β 
SASE Traffic Flow - ZTNA Connector-1.pdf
Module 1 - Cyber Law and Ethics 101.pptx
Triggering QUIC, presented by Geoff Huston at IETF 123
Β 
An introduction to the IFRS (ISSB) Stndards.pdf
Cloud-Scale Log Monitoring _ Datadog.pdf
522797556-Unit-2-Temperature-measurement-1-1.pptx
international classification of diseases ICD-10 review PPT.pptx
Introduction to Information and Communication Technology
RPKI Status Update, presented by Makito Lay at IDNOG 10
Β 
Slides PPTX World Game (s) Eco Economic Epochs.pptx
Testing WebRTC applications at scale.pdf
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...

Prometheus for Monitoring Metrics (Fermilab 2018)

  • 2. Who am I? ● One of the main developers of Prometheus ● Founder of Robust Perception ● Contributor to many open source projects
  • 3. Why monitor? ● Know when things go wrong β—‹ To call in a human to prevent a business-level issue ● Be able to debug and gain insight ● Trending to see changes over time, and drive technical/business decisions ● To feed into other systems/processes
  • 6. Monitor as a Service, not as Machines
  • 7. What is Prometheus? Metrics monitoring system (not logs). A time series database. A query language. Client libraries. An Ecosystem. A modern approach to monitoring services.
  • 9. Client Libraries Instrument your code to capture the metrics that matter to you. If upstream libraries are instrumented, you get that for free! Also many exporters, cAdvisor, MySQL, MongoDB, SNMP, JMX, HAProxy, Minecraft, Factorio...
  • 10. Let’s Talk Code pip install prometheus_client from prometheus_client import Summary, start_http_server REQUEST_DURATION = Summary('request_duration_seconds', 'Request duration in seconds') @REQUEST_DURATION.time() def my_handler(request): pass // Your code here start_http_server(8000)
  • 11. Multiple Dimensions from prometheus_client import Counter REQUESTS = Counter('requests_total', 'Total requests', ['method']) def my_handler(request): REQUESTS.labels(request.method).inc() pass // Your code here
  • 12. Exceptional Circumstances In Progress from prometheus_client import Counter, Gauge EXCEPTIONS = Counter('exceptions_total', 'Total exceptions') IN_PROGRESS = Gauge('inprogress_requests', 'In progress') @EXCEPTIONS.count_exceptions() @IN_PROGRESS.track_inprogress() def my_handler(request): pass // Your code here
  • 13. Getting Data Out from prometheus_client import start_http_server if __name__ == '__main__': start_http_server(8080) Also possible with Django, Twisted etc.
  • 14. The PromQL Query Language Arbitrary aggregation, joins and slicing all possible. Can calculate how close you'll be to your quota in 4 hours, or the 95th percentile latency across an entire datacenter. If you can graph it, you can alert on it!
  • 15. Analytics: Top 5 Docker images by CPU topk(5, sum by (image)( rate(container_cpu_usage_seconds_total{ id=~"/system.slice/docker.*"}[5m] ) ) )
  • 16. Heterogeneity Not all VMs are equal. Noisy neighbours mean different application instance have different performance. But PromQL can aggregate latency across instances, allowing you to alert on overall end-user visible latency rather than outliers.
  • 17. Alert management Not every alert results in a page. Group similar alerts together, route them to the right team and throttle notifications. Designed to work reliably during network partitions.
  • 18. Reliability is Key Core Prometheus server is a single binary. Each Prometheus server is independent. No clustering or attempts to backfill "missing" data when scrapes fail. Option for remote storage for long term storage.
  • 19. Monitoring Approach Service management went from manual to Chef to Kubernetes. Need to do the same for monitoring. Care about what matters to end users, such as latency and error rates. Distracting a human with alerts for everything that's vaguely off only leads to burnout.
  • 20. A Rich Community Today there are 750+ contributors to the core repositories, and 350+ 3rd party integrations. There are 1000+ subscribers on our mailing lists, 600+ people in IRC and an estimated 10000+ companies using Prometheus in production. Many companies funding Prometheus development.
  • 22. What is Prometheus? Metrics monitoring system (not logs). A time series database. A query language. Client libraries. An Ecosystem. A modern approach to monitoring services.
  • 24. Resources Official Project Website: prometheus.io User Mailing List: prometheus-users@googlegroups.com Dev Mailing List: prometheus-developers@googlegroups.com IRC: #prometheus on chat.freenode.net Robust Perception Blog: www.robustperception.io/blog