Kubernetes Observability with Prometheus by Example

Kubernetes Observability with
Prometheus by Example
Thomas Riley

Today
● Introduction to Prometheus
● Deployment on Kubernetes
● Monitoring Kubernetes
● High Availability Prometheus
● Long Term Storage for
Prometheus

What is Prometheus?
● Prometheus is a metrics oriented Monitoring solution (TSDB & Tooling)
● Released by SoundCloud in 2012
● Prometheus project joined Cloud Native Computing Foundation in 2016
● During 2018, become the second project to graduate from incubation
alongside Kubernetes

What is Prometheus?
Prometheus
Application
Service
Discovery Application
Exporter
Alert
Manager
Grafana

Demo 1
Deploying Prometheus to
Kubernetes
1. Install Prometheus Operator to
a GCP Kubernetes Engine
Cluster
2. Provision an instance of
Prometheus via the Operator
3. Conﬁgure Prometheus to
scrape a target

Summary
● Deployed Prometheus Operator using the community Helm Chart
● Launched an instance of Prometheus using the Customer Resource
Definition (CRD) & Controller from the Operator
● Accessed Prometheus using kubectl port forwarding
● Created a ServiceMonitor object that defined a target to monitor, in this
case Prometheus itself!
● Demonstrated how the Prometheus Operator updated the Prometheus
configuration automatically based on the deployed ServiceMonitor.

Demo 2
Monitoring Kubernetes
1. Monitoring Kubernetes with
Metrics using:
a. Node Exporter
b. Kube-state-metrics
c. Kubelet / cAdvisor

Summary
● We used kube-state-metrics for collecting metrics about the state of
resources running within Kubernetes.
● We used the Node Exporter to collect metrics about the health of the
underlying worker nodes and operating systems.
● We collected metrics from Kubelet and cAdvisor that provided insight into
container resource usage.

Highly Un-Available Prometheus
● In our working example we have a
single instance of Prometheus, as
shown in the diagram to the right
● If the Kubernetes worker node
that Prometheus is running on
fails the Pod will temporarily
become unavailable as it is
evicted and launched elsewhere
Targets Targets Targets
Scrape Targets

Highly Available Prometheus
Prometheus x2
Highly Available!
Scrape Targets,
Twice!

Challenges:
• We have two sources of
duplicate metrics!
• Well, so called duplicates
– metrics will vary
between the two slightly!
• Which do we use?

Use a Load Balancer
Load Balancer

Use a Service when
running in K8
Kubernetes Service

Not without its challenges:
• When you refresh the data,
you will see it change as
metrics will potentially differ
between the two instances
Kubernetes Service

Not without its challenges:
• When you refresh the data,
you will see it change as
metrics will potentially differ
between the two instances
• Use sticky load balancing or
make the second instance a
hot standby
• This solution is becoming
complicated and does not
scale with query load
Kubernetes Service

Prometheus HA with Thanos
“Thanos is a set of components
that can be composed into a highly
available metric system with
unlimited storage capacity”

Developed and open-sourced by engineers
at London based Improbable
Today, 5 core maintainers from various
organisations.
github.com/improbable-eng/thanos
914 commits, 3.8k GitHub stars, 117 contributors

Query
2. Thanos Query
makes gRPC
call to Thanos
sidecar for
metrics and
de-duplicates
1. Thanos
sidecar
deployed
alongside
Prometheus in
Kubernetes
Pod using
operator
3. Thanos Query
exposes
Prometheus
HTTP API or
gRPC

Summary
● Demonstrated the drawbacks of managing multiple instances of
Prometheus in an attempt to improve resilience.
● Shows how Prometheus can successfully be implemented for high
availability using Thanos.

Long Term
Storage
The Challenge:
You want to store months or even
years worth of metrics within
Prometheus.
You still need to be able to query
that data and it be performant. Like,
all the data!

Long Term Storage
Storage
Storage Storage

Long Term Storage
Storage
• Prometheus was initially designed for short
metrics retention, it was designed for
monitoring & alerting on what is happening
‘now’
• Local storage can be expensive, especially if
using SSD
• You want to store years of metrics, will this
scale eﬃciently with Prometheus?

Long Term Storage
• Remote write/read API
• Prometheus has remote storage APIs
• The complexity of operating Elasticsearch or similar alongside
Prometheus seems somewhat overengineered

Long Term Storage with Thanos
Query
1. Thanos Sidecar
ships metrics to
storage bucket
such as AWS S3
or GCP Storage
Store
2. Thanos Store makes metrics
available via Thanos Store API
for Query

How?
Memory Block
Targets
Targets
Disk Block

Long Term Storage with Thanos
• Signiﬁcantly reduce storage requirements of each Prometheus instance –
only need to story around 2 to 24 hours of metrics
• Signiﬁcantly cheaper storing metrics in a bucket versus scaling SSD
storage
• Thanos Compact executes compression of Prometheus TSDB data within
the bucket and also downsamples data for when querying over long time
periods – keeps raw (1m), 5m & 15m samples
• Query automatically de-duplicates data within Prometheus and metrics
store in the storage bucket
• Thanos is built from Prometheus TSDB code – not redesigning the wheel

Conclusion
● Use Prometheus Operator for making the automation of Prometheus on
Kubernetes easy!
● Collect time series metrics from everywhere in Kubernetes and start
building dashboards to enhance the Observability of your platform and
services!
● Use Thanos for adding resilience and ease of scalability with Prometheus
in Kubernetes.. It is as easy as deploying a sidecar!

observability.thomasriley.co.uk
https://observability.thomasriley.co.uk

Questions?
Thank you for listening!
I have published a series of K8s Observability tutorials at:
https://observability.thomasriley.co.uk
Get in touch:
Mail: contact@thomasriley.co.uk
Slack: Riley @ kubernetes.slack.com
Twitter: @therealriley

Kubernetes Observability with Prometheus by Example

More Related Content

What's hot (20)

Similar to Kubernetes Observability with Prometheus by Example (20)

Recently uploaded (20)

Kubernetes Observability with Prometheus by Example