SlideShare a Scribd company logo
Scaling Prometheus to measure millions of
things
Paul Traylor
LINE Developer meetup #43 in Kyoto
2018/8/22
1 / 8
Self-Introduction
Paul Traylor
LINE Fukuoka Development Department
Providing Prometheus monitoring as a service
2 / 8
How many people have used Prometheus?
https://prometheus.io/
3 / 8
Start with a single Prometheus Instance
A single Prometheus instance can typically ingest over a million samples
scraped every 15 seconds across one thousand or more targets
4 / 8
Add one more for High Availability
Prometheus is designed to be simple and share nothing. To increase
resilliance, we want to add a second server, scraping the same targets
Alertmanager will handle duplicate alerts from our HA pairs and send
only a single alert
5 / 8
Start sharding
After around 1.5 million scrapes or 5,000 exporters, I start looking to split
up an instance
An unfortunate side effect is that we now have to know about our shards
in Grafana
Alertmanager is shared across all servers
6 / 8
Introduce Thanos
https://improbable.io/games/blog/thanos-prometheus-at-scale
Uses same Prometheus API (Grafana thinks it's talking to Prometheus) and
looks like a single data source
Historical data is not frequently accessed and can live on object storage
7 / 8
Continue scaling up
8 / 8

More Related Content

PPTX
Strange Async Code - ReaxtiveX
PDF
Monitoring with prometheus at scale
PDF
Monitoring with prometheus at scale
PPTX
Scaling Prometheus on Kubernetes with Thanos
PDF
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
PDF
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
PDF
Practical monitoring with Prometheus and Grafana Presentation.pdf
PPTX
Prometheus (Prometheus London, 2016)
Strange Async Code - ReaxtiveX
Monitoring with prometheus at scale
Monitoring with prometheus at scale
Scaling Prometheus on Kubernetes with Thanos
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Practical monitoring with Prometheus and Grafana Presentation.pdf
Prometheus (Prometheus London, 2016)

Similar to Scaling Prometheus to measure millions of things (20)

PDF
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
PDF
Microservices and Prometheus (Microservices NYC 2016)
PPT
Monitoring using Prometheus and Grafana
PDF
DevOps Braga #15: Agentless monitoring with icinga and prometheus
PDF
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
PDF
Prometheus - basics
PDF
Kubernetes Observability with Prometheus by Example
PDF
Prometheus course
PPTX
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
PDF
Prometheus Course from beginners to expert course
PPTX
How to Improve the Observability of Apache Cassandra and Kafka applications...
PPTX
Prometheus (Monitorama 2016)
PDF
Cloud Monitoring tool Grafana
PPTX
Prometheus - Open Source Forum Japan
PPTX
Prometheus for Monitoring Metrics (Fermilab 2018)
PDF
Prometheus and Docker (Docker Galway, November 2015)
PPTX
Prometheus and Grafana
PDF
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
PPTX
Prometheus
PDF
The hitchhiker’s guide to Prometheus
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Microservices and Prometheus (Microservices NYC 2016)
Monitoring using Prometheus and Grafana
DevOps Braga #15: Agentless monitoring with icinga and prometheus
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
Prometheus - basics
Kubernetes Observability with Prometheus by Example
Prometheus course
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus Course from beginners to expert course
How to Improve the Observability of Apache Cassandra and Kafka applications...
Prometheus (Monitorama 2016)
Cloud Monitoring tool Grafana
Prometheus - Open Source Forum Japan
Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Grafana
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus
The hitchhiker’s guide to Prometheus
Ad

More from LINE Corporation (20)

PDF
JJUG CCC 2018 Fall 懇親会LT
PDF
Reduce dependency on Rx with Kotlin Coroutines
PDF
Kotlin/NativeでAndroidのNativeメソッドを実装してみた
PDF
Use Kotlin scripts and Clova SDK to build your Clova extension
PDF
The Magic of LINE 購物 Testing
PPTX
GA Test Automation
PDF
UI Automation Test with JUnit5
PDF
Feature Detection for UI Testing
PDF
LINE 新星計劃介紹與新創團隊分享
PDF
​LINE 技術合作夥伴與應用分享
PDF
LINE 開發者社群經營與技術推廣
PDF
日本開發者大會短講分享
PDF
LINE Chatbot - 活動報名報到設計分享
PDF
在 LINE 私有雲中使用 Managed Kubernetes
PDF
LINE TODAY高效率的敏捷測試開發技巧
PDF
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
PDF
LINE Things - LINE IoT平台新技術分享
PDF
LINE Pay - 一卡通支付新體驗
PDF
LINE Platform API Update - 打造一個更好的Chatbot服務
PDF
Keynote - ​LINE 的技術策略佈局與跨國產品開發
JJUG CCC 2018 Fall 懇親会LT
Reduce dependency on Rx with Kotlin Coroutines
Kotlin/NativeでAndroidのNativeメソッドを実装してみた
Use Kotlin scripts and Clova SDK to build your Clova extension
The Magic of LINE 購物 Testing
GA Test Automation
UI Automation Test with JUnit5
Feature Detection for UI Testing
LINE 新星計劃介紹與新創團隊分享
​LINE 技術合作夥伴與應用分享
LINE 開發者社群經營與技術推廣
日本開發者大會短講分享
LINE Chatbot - 活動報名報到設計分享
在 LINE 私有雲中使用 Managed Kubernetes
LINE TODAY高效率的敏捷測試開發技巧
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
LINE Things - LINE IoT平台新技術分享
LINE Pay - 一卡通支付新體驗
LINE Platform API Update - 打造一個更好的Chatbot服務
Keynote - ​LINE 的技術策略佈局與跨國產品開發
Ad

Recently uploaded (20)

PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Getting Started with Data Integration: FME Form 101
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
project resource management chapter-09.pdf
PDF
Approach and Philosophy of On baking technology
PPTX
A Presentation on Touch Screen Technology
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
1. Introduction to Computer Programming.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Group 1 Presentation -Planning and Decision Making .pptx
Getting Started with Data Integration: FME Form 101
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Digital-Transformation-Roadmap-for-Companies.pptx
project resource management chapter-09.pdf
Approach and Philosophy of On baking technology
A Presentation on Touch Screen Technology
A novel scalable deep ensemble learning framework for big data classification...
Zenith AI: Advanced Artificial Intelligence
A comparative analysis of optical character recognition models for extracting...
cloud_computing_Infrastucture_as_cloud_p
WOOl fibre morphology and structure.pdf for textiles
Heart disease approach using modified random forest and particle swarm optimi...
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
OMC Textile Division Presentation 2021.pptx
1 - Historical Antecedents, Social Consideration.pdf
Chapter 5: Probability Theory and Statistics
1. Introduction to Computer Programming.pptx
Assigned Numbers - 2025 - Bluetooth® Document

Scaling Prometheus to measure millions of things

  • 1. Scaling Prometheus to measure millions of things Paul Traylor LINE Developer meetup #43 in Kyoto 2018/8/22 1 / 8
  • 2. Self-Introduction Paul Traylor LINE Fukuoka Development Department Providing Prometheus monitoring as a service 2 / 8
  • 3. How many people have used Prometheus? https://prometheus.io/ 3 / 8
  • 4. Start with a single Prometheus Instance A single Prometheus instance can typically ingest over a million samples scraped every 15 seconds across one thousand or more targets 4 / 8
  • 5. Add one more for High Availability Prometheus is designed to be simple and share nothing. To increase resilliance, we want to add a second server, scraping the same targets Alertmanager will handle duplicate alerts from our HA pairs and send only a single alert 5 / 8
  • 6. Start sharding After around 1.5 million scrapes or 5,000 exporters, I start looking to split up an instance An unfortunate side effect is that we now have to know about our shards in Grafana Alertmanager is shared across all servers 6 / 8
  • 7. Introduce Thanos https://improbable.io/games/blog/thanos-prometheus-at-scale Uses same Prometheus API (Grafana thinks it's talking to Prometheus) and looks like a single data source Historical data is not frequently accessed and can live on object storage 7 / 8