🔍 Day 19 of #100DaysOfCloud – Deep Dive into Prometheus Architecture for Cloud Native Monitoring

🔍 Day 19 of #100DaysOfCloud – Deep Dive into Prometheus Architecture for Cloud Native Monitoring

Monitoring modern cloud-native applications—especially on dynamic platforms like Kubernetes—requires scalable, flexible, and real-time observability tools. One of the most widely adopted tools in this space is Prometheus, known for its powerful time-series data handling, native Kubernetes support, and seamless integration with Grafana.

Today, I’m diving into Prometheus architecture, explaining how it collects, stores, and queries metrics, along with its core components like TSDB, exporters, service discovery, Push Gateway, and Alertmanager.


📊 Prometheus Architecture Overview

Here’s a high-level breakdown of how Prometheus functions as a robust monitoring system:

Article content
Prometheus Architecture

🧠 1. Prometheus Server (Scraper)

At the heart of the architecture is the Prometheus server. It scrapes metrics from configured targets over HTTP, usually from the /metrics endpoint. This is a pull-based model, meaning Prometheus initiates the data collection at fixed intervals.

You define these scrape intervals and targets in the Prometheus configuration file (prometheus.yml). The targets could be anything from Kubernetes pods to Linux servers to external APIs.


🗂️ 2. Time Series Database (TSDB)

The scraped data is stored in Prometheus’s internal Time Series Database (TSDB). This database organizes the data in a time series format:

<metric_name>{<labels>} timestamp value        

Prometheus supports retention policies for TSDB to prevent storage overload:

  • Time-based retention: e.g., keep data for 15 days (default).
  • Size-based retention: e.g., set a max volume (like 50GB); older data is deleted once this limit is reached.


🔍 3. Service Discovery

Prometheus supports both static and dynamic service discovery:

  • Static config: For targets with fixed IPs or DNS (e.g., legacy VMs or APIs).
  • Kubernetes SD: Prometheus uses the Kubernetes API to auto-discover pods, services, or endpoints, adapting to changes caused by autoscaling and rolling updates.
  • File-based SD: You can also define targets in separate .json or .yaml files (e.g., using file_sd_configs).

This dynamic discovery is critical for cloud environments like AKS, where infrastructure is constantly changing.


📦 4. Exporters

Prometheus doesn’t always scrape metrics directly from services. Instead, it relies on exporters—lightweight agents that expose system or application metrics in Prometheus format.

Examples:

  • Node Exporter for Linux system metrics like CPU, memory, disk I/O.
  • Kube-State-Metrics for Kubernetes object states.
  • Custom exporters for proprietary applications.

Configuration is done in prometheus.yml by specifying the exporter endpoint.


📤 5. Push Gateway

Prometheus is fundamentally a pull-based system, which isn’t ideal for short-lived jobs like Kubernetes CronJobs or ephemeral batch jobs.

That’s where Push Gateway comes in.

  • These short-lived jobs push their metrics to the gateway via HTTP.
  • Prometheus then pulls metrics from the Push Gateway’s /metrics endpoint.

Use case: pushing backup success/failure metrics or batch file processing stats.


🚨 6. Alertmanager

Alertmanager handles alerts sent by Prometheus based on defined conditions.

  • Prometheus evaluates rules like:

alert HighCPUUsage
  if cpu_usage > 80
  for 5m        

  • If the condition is true, it triggers an alert and sends it to Alertmanager.
  • Alertmanager then routes notifications via email, Slack, PagerDuty, etc., based on receiver configurations.

This decoupling ensures that Prometheus focuses on monitoring, while Alertmanager handles communication and deduplication.


🔎 7. PromQL – Prometheus Query Language

PromQL is a flexible, powerful query language used to retrieve and visualize time-series data. You can use it:

  • In Prometheus web UI (localhost:9090).
  • In Grafana dashboards (Prometheus as a data source).
  • In CLI for scripting or troubleshooting.

Sample query:

rate(http_requests_total[5m])        

This would give you the per-second request rate over the last 5 minutes.


🧩 Summary

Prometheus is purpose-built for cloud-native monitoring and works exceptionally well with Kubernetes environments like AKS. Here’s a quick recap of how it works:

Prometheus pulls metrics from configured targets at defined intervals. The data is stored in a local time-series database, governed by retention policies. Targets are discovered either statically or dynamically via Kubernetes APIs. Exporters expose the metrics in Prometheus format, and batch jobs use Push Gateway to push their metrics. Alertmanager handles notifications, and PromQL helps query and visualize metrics.**

🔚 Final Thoughts

In my work with AKS and Node.js microservices, Prometheus has helped us proactively monitor resource usage, identify performance bottlenecks, and set up real-time alerts for key metrics. It’s a foundational tool for observability in DevOps and SRE practices.

✅ If you’re deploying services in a distributed environment, integrating Prometheus + Grafana is a must-have step for visibility and peace of mind.


📌 Next up on Day 20: I’ll dive into Grafana integration with Prometheus and how to build real-time dashboards for cloud-native apps.

Let me know if you’ve used Prometheus in your projects—or if you want to see a hands-on setup guide!

#DevOps #Azure #100DaysOfCloud #Prometheus #CloudMonitoring #AKS #Observability #Grafana


Tyler Robinson

Technical Leadership | Expert in Platform Software, Cloud Technologies, and Strategic Innovation | Personal Accountability Change Agent

2w

Fantastic work, Sushant! Exploring Prometheus with AKS for real-time monitoring showcases its power in cloud-native observability. Keep pushing the boundaries! 📊

To view or add a comment, sign in

Others also viewed

Explore topics