SlideShare a Scribd company logo
chronosphere.io
From Cardinal(ity) Sins to
Cost Efficient Metrics
Aggregation
Paige Cruz, retired SRE
open source observability advocate
chronosphere.io
CFO looking at the
o11y bill
chronosphere.io
chronosphere.io
chronosphere.io
Cloud Native Observability
bills are outrageous
Cloud Native Data Growth
7
Cloud
(IaaS,
VM-based)
2008 - 2018
Cloud Native
(Microservices and Containers)
2018 - ?
On-Premises
(Data center)
1998 - 2008
Business
Increase in
Scale
Observability
Data Increase
in Scale
*Source: ESG Distributed Cloud Series: Observability, Feb 2022, Scott Sinclair and Rob Strechay
chronosphere.io
Most recently [vendor] was looked at to help monitor a small Kubernetes test cluster. 3
nodes.
Now the base rate of $18/mo is fine…except now they charge $1 per container per month
past 10 containers per host.
Since K8s (depending on how you install it) runs a bunch of little containers handling various
back end things, you might not deploy anything to the cluster and still be WAY over that 10
container limit.
In our case it came out to like $200/mo to monitor 3 nodes - that were nowhere fully
loaded.
- Hacker News thread
chronosphere.io
Data volume
Experiment:
- Hello World app on 4 node
Kubernetes cluster with
Tracing, End User Metrics
(EUM), Logs, Metrics
(containers / nodes)
- 30 days == +450 GB
Mighty Metrics
“ 1 in 10 metrics are
actually directly
queried
- ServiceNow
Contributing Factors to the Metrics Bill
12
How many
things you’re
monitoring
# of containers
and infra
components
How often each
metric is
scraped
Metric
Granularity
How long you
keep the data
Retention
Window
How many
unique combos
of dimensions
on metrics
Cardinality
12
13
14
Cost of monitoring can be a factor in determining how quickly
to deprecate or sunset features/services/environments
# of containers and infra components
14
15
Emission time = adjust scrape_interval (from 10s samples ->
30s samples)
Ingest time = aggregate
Over time post-storage = downsampling
Metric Granularity
15
16
Aggregation: Roll Up
16
17
For operational metrics……most (99.9%) of queries do
not pass 7 days but average retention at original
granularity ranges from 2-4 weeks
Retention Window
17
18
Low value tags or entire metrics should be dropped
as early as possible
Dropping Data
18
19
Cardinality
19
Auditing Your
Metrics
“
What is the value
of this metric?
- You, when auditing metrics
Auditing Your Metrics
22
22
● Scope what your team is responsible for
○ filter queries with team:YOURS
● Identify easy wins. Metrics that aren’t
○ In a monitor definition
○ Directly queried by end users
○ Powering charts for visited dashboards
● Identify labels that are unnecessary
○ e.g. prometheus instance label or instance_type
● Share your successes!
23
23
24
24
CFO looking at the
cost efficiency of
metrics
Resources
- How Gloo uses the OTel Collector to drop metrics/labels
and provide the Minimum Metrics Set
- How to drop and delete metrics in Prometheus
- How can recording and data roll-up rules help your
metrics?
- Observability is Too Damn Expensive - DevOpsDays London
Catch up with me:
- Rescuing On-Call Engineers
(send your manager)
- KubeCon OTel 101: Let’s
Instrument! (tracing) workshop
- There’s No Place Like
Production Conf42 Incident
Management
paigerduty@
chronosphere.io
hachyderm.io
LinkedIn
Q&A

More Related Content

PPTX
KCD Porto: Choose Your Own Adventure - Cloud Naive Observability Pitfalls
PPTX
Optimizing Observability Spend: Metrics
PDF
3 Pitfalls Everyone Should Avoid with Cloud Native Observability
PPTX
Optimizing Observability Spend: Metrics
PPTX
Choose Your Own Adventure - Cloud Native Observability Pitfalls
PDF
Construire une plateforme d'observabilité centralisée
PDF
Building a centralized observability platform
PDF
Creación de una plataforma de observabilidad centralizada
KCD Porto: Choose Your Own Adventure - Cloud Naive Observability Pitfalls
Optimizing Observability Spend: Metrics
3 Pitfalls Everyone Should Avoid with Cloud Native Observability
Optimizing Observability Spend: Metrics
Choose Your Own Adventure - Cloud Native Observability Pitfalls
Construire une plateforme d'observabilité centralisée
Building a centralized observability platform
Creación de una plataforma de observabilidad centralizada

Similar to From Cardinal(ity) Sins to Cost-Efficient Metrics Aggregation (20)

PPTX
Checking the pulse of your cloud native architecture
PDF
Building a centralized observability platform
PDF
Welcome to the Metrics
PDF
Desarrollo de una plataforma de observabilidad centralizada
PDF
Shift left Observability
PDF
Metrics Cost Management with Adaptive Metrics.pdf
PDF
Construção de uma plataforma de observabilidade centralizada
PPTX
Solving the Hidden Costs of Kubernetes with Observability
PPTX
Monitoring Weave Cloud with Prometheus
PPTX
How to Wrestle Your Observability Data Demons and Win!
PPTX
3 Pitfalls Everyone Should Avoid with Cloud Data
PPTX
Why observability matters - now and in the future (w/guest Grafana)
PPTX
Observability for Application Developers (1)-1.pptx
PPTX
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
PPTX
3 Pitfalls Everyone Should Avoid with Cloud Data
PPTX
Microservice observability 2019
PDF
Observability foundations in dynamically evolving architectures
PDF
Trajectory 2022 - Shifting Cloud Native Observability to the Left
PPTX
3 Pitfalls Everyone Should Avoid with Cloud Native Data
PDF
I pushed in production :). Have a nice weekend
Checking the pulse of your cloud native architecture
Building a centralized observability platform
Welcome to the Metrics
Desarrollo de una plataforma de observabilidad centralizada
Shift left Observability
Metrics Cost Management with Adaptive Metrics.pdf
Construção de uma plataforma de observabilidade centralizada
Solving the Hidden Costs of Kubernetes with Observability
Monitoring Weave Cloud with Prometheus
How to Wrestle Your Observability Data Demons and Win!
3 Pitfalls Everyone Should Avoid with Cloud Data
Why observability matters - now and in the future (w/guest Grafana)
Observability for Application Developers (1)-1.pptx
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
3 Pitfalls Everyone Should Avoid with Cloud Data
Microservice observability 2019
Observability foundations in dynamically evolving architectures
Trajectory 2022 - Shifting Cloud Native Observability to the Left
3 Pitfalls Everyone Should Avoid with Cloud Native Data
I pushed in production :). Have a nice weekend
Ad

More from Paige Cruz (20)

PDF
Live, Laugh, Log (SRECon North America March 2025)
PPTX
What Real Housewives Taught Me About Postmortems - SRECon Lightning Talk
PDF
3rd Wave Observability: Open or Bust (OSOD)
PDF
The Art of Asking - FOSSY 2024 - PDX - Paige Cruz
PDF
The "Zen" of Python Exemplars - OTel Community Day
PDF
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
PDF
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
PDF
Power Up with Podman - Kubernetes Community Day LA
PDF
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
PDF
OTel Orientation: How to Train Teams (OTel in Practice)
PDF
Avoiding Alert Bankruptcy and Burnout
PDF
Tracing Adventures from PR - Production
PDF
Threat Modeling in the Cloud
PDF
There's No Place Like Production
PDF
Taming Feral DevOps
PDF
SRECon23 Cognitive Apprenticeship in Action_ Alert Triage Hour of Power
PDF
Pushing Observability Uphill - The Single “Pain” of Glass
PDF
Power Up with Podman
PDF
Intro to Instrumentation
PDF
99.9% of Your Traces are Trash
Live, Laugh, Log (SRECon North America March 2025)
What Real Housewives Taught Me About Postmortems - SRECon Lightning Talk
3rd Wave Observability: Open or Bust (OSOD)
The Art of Asking - FOSSY 2024 - PDX - Paige Cruz
The "Zen" of Python Exemplars - OTel Community Day
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Power Up with Podman - Kubernetes Community Day LA
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
OTel Orientation: How to Train Teams (OTel in Practice)
Avoiding Alert Bankruptcy and Burnout
Tracing Adventures from PR - Production
Threat Modeling in the Cloud
There's No Place Like Production
Taming Feral DevOps
SRECon23 Cognitive Apprenticeship in Action_ Alert Triage Hour of Power
Pushing Observability Uphill - The Single “Pain” of Glass
Power Up with Podman
Intro to Instrumentation
99.9% of Your Traces are Trash
Ad

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PDF
Machine learning based COVID-19 study performance prediction
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Empathic Computing: Creating Shared Understanding
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
MYSQL Presentation for SQL database connectivity
Machine learning based COVID-19 study performance prediction
Advanced methodologies resolving dimensionality complications for autism neur...
Per capita expenditure prediction using model stacking based on satellite ima...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Electronic commerce courselecture one. Pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation_ Review paper, used for researhc scholars
Review of recent advances in non-invasive hemoglobin estimation
Network Security Unit 5.pdf for BCA BBA.
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Empathic Computing: Creating Shared Understanding
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
NewMind AI Weekly Chronicles - August'25 Week I
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...

From Cardinal(ity) Sins to Cost-Efficient Metrics Aggregation

  • 1. chronosphere.io From Cardinal(ity) Sins to Cost Efficient Metrics Aggregation Paige Cruz, retired SRE open source observability advocate
  • 7. Cloud Native Data Growth 7 Cloud (IaaS, VM-based) 2008 - 2018 Cloud Native (Microservices and Containers) 2018 - ? On-Premises (Data center) 1998 - 2008 Business Increase in Scale Observability Data Increase in Scale *Source: ESG Distributed Cloud Series: Observability, Feb 2022, Scott Sinclair and Rob Strechay
  • 8. chronosphere.io Most recently [vendor] was looked at to help monitor a small Kubernetes test cluster. 3 nodes. Now the base rate of $18/mo is fine…except now they charge $1 per container per month past 10 containers per host. Since K8s (depending on how you install it) runs a bunch of little containers handling various back end things, you might not deploy anything to the cluster and still be WAY over that 10 container limit. In our case it came out to like $200/mo to monitor 3 nodes - that were nowhere fully loaded. - Hacker News thread
  • 9. chronosphere.io Data volume Experiment: - Hello World app on 4 node Kubernetes cluster with Tracing, End User Metrics (EUM), Logs, Metrics (containers / nodes) - 30 days == +450 GB
  • 11. “ 1 in 10 metrics are actually directly queried - ServiceNow
  • 12. Contributing Factors to the Metrics Bill 12 How many things you’re monitoring # of containers and infra components How often each metric is scraped Metric Granularity How long you keep the data Retention Window How many unique combos of dimensions on metrics Cardinality 12
  • 13. 13
  • 14. 14 Cost of monitoring can be a factor in determining how quickly to deprecate or sunset features/services/environments # of containers and infra components 14
  • 15. 15 Emission time = adjust scrape_interval (from 10s samples -> 30s samples) Ingest time = aggregate Over time post-storage = downsampling Metric Granularity 15
  • 17. 17 For operational metrics……most (99.9%) of queries do not pass 7 days but average retention at original granularity ranges from 2-4 weeks Retention Window 17
  • 18. 18 Low value tags or entire metrics should be dropped as early as possible Dropping Data 18
  • 21. “ What is the value of this metric? - You, when auditing metrics
  • 22. Auditing Your Metrics 22 22 ● Scope what your team is responsible for ○ filter queries with team:YOURS ● Identify easy wins. Metrics that aren’t ○ In a monitor definition ○ Directly queried by end users ○ Powering charts for visited dashboards ● Identify labels that are unnecessary ○ e.g. prometheus instance label or instance_type ● Share your successes!
  • 23. 23 23
  • 24. 24 24 CFO looking at the cost efficiency of metrics
  • 25. Resources - How Gloo uses the OTel Collector to drop metrics/labels and provide the Minimum Metrics Set - How to drop and delete metrics in Prometheus - How can recording and data roll-up rules help your metrics? - Observability is Too Damn Expensive - DevOpsDays London
  • 26. Catch up with me: - Rescuing On-Call Engineers (send your manager) - KubeCon OTel 101: Let’s Instrument! (tracing) workshop - There’s No Place Like Production Conf42 Incident Management paigerduty@ chronosphere.io hachyderm.io LinkedIn
  • 27. Q&A