Scaling Prometheus to measure millions of things

Scaling Prometheus to measure millions of
things
Paul Traylor
LINE Developer meetup #43 in Kyoto
2018/8/22
1 / 8

Self-Introduction
Paul Traylor
LINE Fukuoka Development Department
Providing Prometheus monitoring as a service
2 / 8

How many people have used Prometheus?
https://prometheus.io/
3 / 8

Start with a single Prometheus Instance
A single Prometheus instance can typically ingest over a million samples
scraped every 15 seconds across one thousand or more targets
4 / 8

Add one more for High Availability
Prometheus is designed to be simple and share nothing. To increase
resilliance, we want to add a second server, scraping the same targets
Alertmanager will handle duplicate alerts from our HA pairs and send
only a single alert
5 / 8

Start sharding
After around 1.5 million scrapes or 5,000 exporters, I start looking to split
up an instance
An unfortunate side effect is that we now have to know about our shards
in Grafana
Alertmanager is shared across all servers
6 / 8

Introduce Thanos
https://improbable.io/games/blog/thanos-prometheus-at-scale
Uses same Prometheus API (Grafana thinks it's talking to Prometheus) and
looks like a single data source
Historical data is not frequently accessed and can live on object storage
7 / 8

Scaling Prometheus to measure millions of things

More Related Content

Similar to Scaling Prometheus to measure millions of things (20)

More from LINE Corporation (20)

Recently uploaded (20)

Scaling Prometheus to measure millions of things