Chapter 2: SLOs Made Simple

Poojitha A S

Senior DevOps/SRE | Kubernetes | AWS & Azure | CI/CD | Automation | 9+ Years | DevOps Content Creator on Linkedln

Published May 12, 2025

✍️ By Poojitha A S, adapted and simplified from the Google SRE Book with real-world flavor from Evernote, The Home Depot, and lessons you can apply today

Why SLOs Actually Matter

SLOs (Service Level Objectives) aren’t just impressive-sounding figures you slap on a dashboard. They’re mutual agreements, clear expectations for how systems should behave. Think of them as a heads-up system. When something drifts out of bounds, you’ll know before users hit the panic button.

They help answer:

How fast should we fix this?
Should we hold off on that release?
Is it time to focus on reliability instead of adding new stuff?

But here's the thing. SLOs aren’t just about numbers or uptime percentages. They’re about how teams collaborate, make decisions, and balance risk. So let’s cut the jargon and get to the heart of it.

“The whole point of SLOs is to support the notion of gradual improvement.”

How to Build SLOs

Choose the right SLIs: These are your signals like latency, availability, error rates, and throughput.
Set meaningful SLOs: Targets that reflect reality such as 99.9% uptime, less than 300ms latency, and so on.
Define your error budget: How much failure can you tolerate before it’s a problem?
Write a response plan: What happens when you run out of budget?
Get buy-in from everyone: Dev, Ops, and Product all need to care.
Measure, review, adjust: Use real data and update when things change.

“If you’re not going to take action when an SLO is violated, don’t bother setting it.”

A Look at Evernote’s SLO Journey

The issue? Constant tension between Dev and Ops. One side wanted to ship fast. The other was tired of cleaning up the mess. Everyone felt the friction.

What changed? They moved to the cloud and introduced SLOs.

Here’s what that looked like:

Picked a high-impact SLO: 99.95% uptime
Used external tools like Pingdom to verify availability
Set a simple rule: if probes failed from two separate regions, it counted as downtime
Focused on user impact, not just backend metrics

What came out of it?

Shared goals for Dev and Ops
Monthly SLO reviews with Google’s Customer Reliability Engineering team
Less downtime, fewer finger-pointing meetings

“Perfect is the enemy of good. Start simple, and evolve your SLOs.”

Inside The Home Depot’s VALET Framework

Massive company, complex systems, and everyone was using different metrics. That made it hard to align.

To fix that, they created the VALET framework:

Volume: Can the system handle what’s coming?
Availability: Is it up and running?
Latency: Is it fast enough?
Errors: Is it giving the right responses?
Tickets: Are people opening support tickets?

What started as a framework turned into a full culture shift:

Internal training and workshops
Custom dashboards and reports
Evangelism campaigns, even t-shirts
Chatbots and BigQuery pipelines to auto-report VALET stats

📉 What improved?

Root cause analysis got faster
Surprises dropped
Teams finally spoke the same language

🎧 Want to Go Deeper?

Podcasts to Bookmark:

Google SRE Prodcast: Engineers unpacking what keeps systems stable
Screaming in the Cloud: Where SLOs meet real-world team dynamics

Tools to Explore:

Nobl9:Great for visualizing error budgets
Pingdom and New Relic: Popular for tracking SLIs

Books Worth Reading:

The Art of Monitoring by James Turnbull
Reliable Machine Learning by Cathy Chen

“The act of writing an SLO is more valuable than the number itself.”

TL;DR:

✅ Choose SLIs that reflect what your users actually care about

✅ Set SLOs that are ambitious, not unrealistic

✅ Foster a culture of reliability, not finger-pointing

✅ Let the data guide you, but don’t forget the human side

Want a free SLO template in Notion and Excel? Drop a “SLO READY” in the comments.

Credits

Inspired by “Implementing SLOs” and “SLO Case Studies” from the Google SRE Book (CC BY-NC-ND 4.0)
Real-world examples from Evernote and The Home Depot
Simplified and written by Poojitha A S for the DevOps Made Simple community

📬 DevOps Made Simple – Weekly Drops

No jargon. No noise. Just practical DevOps strategies from the real world, delivered every week.

👉 Subscribe now

DevOps Made Simple

1,656 follower

+ Subscribe

Syed Dawood

3mo

Thanks for sharing, Poojitha

1 Reaction

Mani Senthil

Vice President - Observability Engineer / SRE at Citi Bank

3mo

Very good write-up on the mark👌

1 Reaction

Sravan D.

Software Engineer

3mo

SLO READY

1 Reaction

See more comments

To view or add a comment, sign in

See all

Chapter 2: SLOs Made Simple

Poojitha A S

Senior DevOps/SRE | Kubernetes | AWS & Azure | CI/CD | Automation | 9+ Years | DevOps Content Creator on Linkedln

Why SLOs Actually Matter

How to Build SLOs

A Look at Evernote’s SLO Journey

Inside The Home Depot’s VALET Framework

🎧 Want to Go Deeper?

DevOps Made Simple

1,656 follower

More articles by this author

Others also viewed

Day 54 of 100 - Service Discovery & Load Balancing in Kubernetes: Ingress, kube-dns & Istio

SRE Playbook - Step By Step

Why System Scalability Requires A CTO With An Architecture Mindset

Day 9/60 : Mastering Resource Limits, Requests & Probes in Kubernetes

SLIs/SLOs Are Too Rigid

The cost of squirrels: why your platform team never finishes what it starts

When Infrastructure Scales But Understanding Doesn't

Docker vs Virtual Machines: What Should You Be Using in 2025?

Kubernetes v1.34 Beta

Mastering Kubernetes Scaling: HPA, VPA, KEDA, and Cluster Autoscaler

Explore topics

Why SLOs Actually Matter

How to Build SLOs

A Look at Evernote’s SLO Journey

Inside The Home Depot’s VALET Framework

🎧 Want to Go Deeper?

DevOps Made Simple

1,656 follower

Episode 24 :AI in SRE: Friend, Not Foe

Aug 11, 2025

EPISODE 23: Inside Google’s Production Environment Through an SRE’s Eyes

Aug 8, 2025

Episode 22: Security vs Reliability :A System’s Secret Tug of War

Aug 6, 2025

DevOps Made Simple: Episode 21

Aug 4, 2025

Interview Edition 4 |Featuring Mercy Strickland | Founder of C3 Women Only

Aug 1, 2025

DevOps Made Simple : Episode 20 Ending the Relationship: When It’s Time to Move On

Jul 30, 2025

Devops Made Simple Episode 19:SRE: Reaching Beyond Your Walls

Jul 28, 2025

DevOps Made Simple Episode 19 : 65 Million Reasons to Learn Kubernetes And 2 Tools That Make It Easy

Jul 14, 2025

EPISODE: 18 Configuration Without Chaos: From YAML Hell to Jsonnet Zen

Jul 11, 2025

Mastering Configuration Design: DevOps Made Simple Episode #17

Jul 9, 2025

Others also viewed

Day 54 of 100 - Service Discovery & Load Balancing in Kubernetes: Ingress, kube-dns & Istio

SRE Playbook - Step By Step

Why System Scalability Requires A CTO With An Architecture Mindset

Day 9/60 : Mastering Resource Limits, Requests & Probes in Kubernetes

SLIs/SLOs Are Too Rigid

The cost of squirrels: why your platform team never finishes what it starts

When Infrastructure Scales But Understanding Doesn't

Docker vs Virtual Machines: What Should You Be Using in 2025?

Kubernetes v1.34 Beta

Mastering Kubernetes Scaling: HPA, VPA, KEDA, and Cluster Autoscaler

Explore topics