Data Engineer's Lunch #23: Thanos/Cortex

Version 1.0
Prometheus at Scale : Thanos / Cortex / etc.
An Anant Corporation Story.
How Prometheus scales in global business platforms

Prometheus (recap)
● Multidimensional data model over time via metric
name, and key/value pairs
● PromQL, now standard query language
● Time series collection via pull or push (via gateway)
● Dynamic service discovery or via static conﬁguration
● Separation of concerns in graphing / dashboarding

Prometheus at Scale
● Cortex
● Thanos
● M3DB (from Uber)
● Victoria Metrics
● Vulcan (from Digital Ocean)
https://sysdig.com/blog/challenges-scale-prometheus/

Prometheus at Scale Needs
● Global View - Queries over multiple promethei
● Multi-Replica / High Availability - No downtime, no data loss
● Long Term Storage - Store data in cold storage for future
● Global Scale - Millions of containers / pods / vms
● Community Support - Many people using it
● Community Knowledge Online - Many people documenting

Cortex
● Global View - Centralized data
● Multi-Replica / High Availability - Dedupe at write
● Long Term Storage - NoSQL Index + Chunks
○ Index (Cassandra / DynamoDB/ BigTable)
○ Chunk (Cassandra / DynamoDB/ BigTable/S3 /
GCS/Azure)

Thanos
● Global View - Federated Data / Fan out queries
● Multi-Replica / High Availability - Query time dedupe
● Long Term Storage - TSDB blocks in object store
○ GCS
○ S3 Compatible (Ceph/ Minio
○ Azure Blob Storage
○ ….

Thanos - Basic Architecture
●

Resources
● Thanos - Scalable Prometheus
(https://www.infoq.com/news/2018/06/thanos-scalable-prometheus )
● Cortex Architecture (https://cortexmetrics.io/docs/architecture/)
● Thanos (https://thanos.io/)
● Challenges of Prometheus at Scale (https://sysdig.com/blog/challenges-scale-prometheus/)
● Tutorial : Prometheus at Scale
(https://epsagon.com/tools/thanos-tutorial-prometheus-at-scale/)
● Github / Cortex (https://github.com/cortexproject/cortex)

Strategy: Scalable Fast Data
Architecture: Cassandra, Spark, Kafka
Engineering: Node, Python, JVM,CLR
Operations: Cloud, Container
Rescue: Downtime!! I need help.
www.anant.us | solutions@anant.us | (855) 262-6826
3 Washington Circle, NW | Suite 301 | Washington, DC 20037

Data Engineer's Lunch #23: Thanos/Cortex

More Related Content

What's hot (20)

Similar to Data Engineer's Lunch #23: Thanos/Cortex (20)

More from Anant Corporation (20)

Recently uploaded (20)

Data Engineer's Lunch #23: Thanos/Cortex