Time series denver an introduction to prometheus

An Introduction to Prometheus
Time Series Denver - May 30, 2018

Introduction
● CTO & Co-Founder - FreshTracks.io - A CA Accelerator Incubation
○ “Simplifying Kubernetes Visibility”
● bob@freshtracks.io
● @bob_cotton
● Father, Fly Fisher & Avid Homebrewer

Agenda
● What is a Cloud Native Application?
● Cloud Native Application Challenges
● The 5 Pillars of Monitoring
● An Introduction to Prometheus
● What FreshTracks Provides

What is a Cloud Native Application?

Cloud Native Application
● Follows 12 Factor Application Practices
● Packaged into containers
● Follows a micro-service architecture
● Managed by a Container Orchestration
○ Kubernetes, Docker Swarm, Mesos
● Usually deployed on dynamic
infrastructure
○ VMWare
○ Cloud providers
● Application lifecycle allows for
○ Auto-provisioning
○ Auto-scaling
○ Auto-redundancy

Cloud Native Applications Challenges

Cloud Native Challenges
● Containers are ephemeral
○ Scheduled on any node in the cluster
○ Move Frequently on restarts and deployments
● Kubernetes needs to be monitored
● Kubernetes brings additional complexities
○ Resource Quotas
○ Pod and Cluster Scaling
● Challenges traditional tools

The 5 Pillars of Monitoring
Metrics and
Alerting Log Analytics
Distributed
Tracing
Application
Performance
Monitoring
Real User
Monitoring

Prometheus
● Started in 2012 at SoundCloud by ex-Google Engineers
○ Open Sourced in 2015
● Patterned after “BorgMon” - Google’s Container monitoring system
● Second project accepted into the CNCF after Kubernetes
● Adoption surge is tracking Kubernetes
○ 63% of teams using Kubernetes use Prometheus

Prometheus Major Features
● Label/value based time series data model
● “Pull based” metrics collection
● Service discovery mechanism
● Simple metrics format with a rich set of “exporters”
● Extremely high-performance TSDB
● Extensive query language - PromQL
● Alert Manager
● Easily installable from Helm
○ Single, statically linked binary
● Open Source Grafana used for visualization

Time Series Data Model
<identifier> → [(t0, v0), (t1, v1), (t2, v2) …]
Identifier is a collection of label/value pairs
Time stored as int64 - Millis since the epoch
Values stored as float64
Efficient storage on disk -- 1.3 bytes/sample

Label/Value Based Data Model
● Graphite/StatsD
○ apache.192-168-5-1.home.200.http_request_total
○ apache.192-168-5-1.home.500.http_request_total
○ apache.192-168-5-1.about.200.http_request_total
● Prometheus
○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/home”, status=”200”}
○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/home”, status=”500”}
○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/about”, status=”200”}
● Selecting Series
○ *.*.home.200.*.http_requests_total
○ http_requests_total{status=”200”, path=”/home”}

Client Data Model
● Counters
○ Always go up or get reset to 0
● Gauge
○ Tracks a real value e.g. temperature
● Histogram and Summary
○ Used for percentiles

Prometheus Service Discovery and Target Scrape
Prometheus
K8s API Server
TSDB
Kublet
(cAdvisor)
node-exporter
kube_state_metrics
App containers
other exporters
node_exporter
App containers
Kublet
(cAdvisor)
Service Discovery

Prometheus Exposition Format and Exporters
● The Prometheus exposition format - Text over http. Simple, human readable
● Supported by Sysdig and the TICK collector
○ Efforts to make it a standard
● Close to 100 exporters for various technologies
● The jmx_exporter can cover any Java/JMX application
● https://prometheus.io/docs/instrumenting/exporters/
Official Exporters:
● node_exporter
● jmx_exporter
● snmp_exporter
● haproxy_exporter
● cloudwatch_exporter
● collectd_exporter
● mysql_exporter
● memcached_exporter

Querying Series with PromQL
● PromQL is a functional query language. Nothing like SQL
rate(http_requests_total[5m])
select job, instance, path, status
rate(value, 5m)
FROM http_requests_total;

Querying Series with PromQL
Calculate a ratio of website hits to failures:
sum(rate(http_requests_total{status=”500”}[5m])) by (path) /
sum(rate(http_requests_total[5m])) by (path)
{path=”/home”} 0.014
{path=”/about”} 0.027

@bob_cotton@bob_cotton
Labels, Re-Label and Recording Rules
Oh My...

@bob_cotton
Kubernetes Labels
● Kubernetes gives us labels on all the things
● Our scrape targets live in the context of the K8s labels
○ This comes from service discovery
● We want to enhance the scraped metric labels with K8s labels
● This is why we need relabel rules in Prometheus

@bob_cotton
K8s API Server
TSDB
Scrape Target
Service Discovery
Prometheus
0="{__address__ 300.196.17.41}"
1="{__meta_kubernetes_namespace default}"
2="{__meta_kubernetes_pod_annotation_freshtracks_io_data_sidecar true}"
3="{__meta_kubernetes_pod_annotation_freshtracks_io_path /metrics2}"
4="{__meta_kubernetes_pod_annotation_kubernetes_io_created_by "kind":"SerializedReference"?}"
5="{__meta_kubernetes_pod_annotation_kubernetes_io_limit_ranger LimitRanger plugin set: cpu
request for container prometheus-configmap-reload; cpu request for container data-sidecar}"
6="{__meta_kubernetes_pod_annotation_prometheus_io_port 8077}"
7="{__meta_kubernetes_pod_annotation_prometheus_io_scrape false}"
8="{__meta_kubernetes_pod_container_name prometheus-configmap-reload}"
9="{__meta_kubernetes_pod_host_ip 172.20.42.119}"
10="{__meta_kubernetes_pod_ip 100.96.17.41}"
11="{__meta_kubernetes_pod_label_freshtracks_io_cluster bowl.freshtracks.io}"
12="{__meta_kubernetes_pod_label_pod_template_hash 1636686694}"
13="{__meta_kubernetes_pod_label_run data-sidecar}"
14="{__meta_kubernetes_pod_name data-sidecar-1636686694-83crm}"
15="{__meta_kubernetes_pod_node_name ip-xx-xxx-xx-xxx.us-west-2.compute.internal}"
16="{__meta_kubernetes_pod_ready false}"
17="{__metrics_path__ /metrics}"
18="{__scheme__ http}"
19="{job ftio-data-sidecar-calc}"
<relabel_config>
{__address__ 300.196.17.41:8077}
{__scheme__ http}
{__metrics_path__ /metrics}
{job ftio-data-sidecar-calc}
{kubernetes_namespace default}
{container_name prometheus-configmap-reload}
http_requests_total{region=”us-east”,
az=”us-east-1”, instance_type=”m2.xlarge”,
instance=”i-3582k8”, hostname=”host1”} = 5439
http_requests_total{region=”us-east”,
az=”us-east-1”,
instance_type=”m2.xlarge”,
instance=”i-3582k8”,
hostname=”host1”,
instance=”300.196.17.41:8077”,
job=”ftio-data-sidecar-calc”,
kubernetes_namespace=”default”,
container_name=”prometheus-configmap-reload”,
} = 5439
<metric_relabel_config>

Recording Rules - Derivative Series
● New series can be generated by querying existing series and storing them
path:request_failures_per_requests:ratio_rate5m =
sum(rate(http_requests_total{status=”500”}[5m])) by (path)
sum(rate(http_requests_total[5m])) by (path)

High Availability
Prometheus
Prometheus

Federation
Prometheus
Prometheus
Prometheus
Prometheus
Prometheus
Prometheus
Prometheus
Prometheus
Subset of Metrics

Long Term Storage and External Integrations
Prometheus
remote_write
● AppOptics: write
● Chronix: write
● Cortex: read and write
● CrateDB: read and write
● Elasticsearch: write
● Gnocchi: write
● Graphite: write
● InfluxDB: read and write
● OpenTSDB: write
● PostgreSQL/TimescaleD
B: read and write
● SignalFx: write
remote_read

Alert Definition
ALERT <alert name>
EXPR <expression>
[ FOR <duration> ]
[ LABELS <label set> ]
[ ANNOTATIONS <labelset> ]
ALERT: IngesterCrowding
EXPR: count by(ft_cluster, node)
(cortex_ingester_ingested_samples_total) > 1
FOR: 30m
LABELS: severity: critical
ANNOTATIONS:
description:
https://github.com/Fresh-Tracks/gke-configs/blob/master
/docs/alerts.md#ingestercrowding
summary: Node {{ $labels.node }} is hosting {{ $value
}} ingester pods

Alert Manager
● Deduplication
● Grouping
● Routing
● Suppression

Alert Manager
Prometheus
Prometheus
Alert Manager
Alert Manager
PagerDuty
VictorOps
Slack

FreshTracks.io
Simplifying Kubernetes Visibility

Filling the Gaps
● A small Kubernetes cluster generate > 500K unique samples
○ Which metrics are important?
● Performance of any one container is easy
○ How is the whole microservice behaving? Node? Cluster?
● Prometheus has no anomaly detection
● Dashboard creation is tedious, even if you know what to watch
● How is my service behaving in the context of the cluster?
○ How do node/container/application metrics correlate to each other?

Kubernetes Hierarchy Visibility
Namespace
Workload
Pod
Container
(Workload can be a deployment,
replicaSet, statefulSet,
daemonSet or similar)

Time series denver an introduction to prometheus

More Related Content

What's hot (20)

Similar to Time series denver an introduction to prometheus (20)

Recently uploaded (20)

Time series denver an introduction to prometheus