Chapter 6: Simplicity Made Simple

Poojitha A S

Senior DevOps/SRE | Kubernetes | AWS & Azure | CI/CD | Automation | 9+ Years | DevOps Content Creator on Linkedln

Published May 28, 2025

✍️ By Poojitha A S Adapted and simplified from the Google SRE Book and lessons from Google’s Display Ads, Borg, Omega, and platform-wide SRE efforts

Why Simplicity Is a Superpower in SRE

“A complex system that works is invariably found to have evolved from a simple system that worked.” — Gall’s Law

In SRE, simplicity = reliability.

Simple systems break less, recover faster, and are easier to maintain, test, and debug.

Simplicity isn’t just about clean cod. it’s end-to-end: System design, tools, deployment pipelines, architecture diagrams, even onboarding and documentation.

Measuring Complexity: Easier Said Than Done

You can measure code complexity with tools like cyclomatic complexity, but systems? Much harder.

Here are a few proxies SREs use:

✅ Training time : How long before a new engineer can go on-call?

✅ Explanation time : Can you whiteboard the system in 10 minutes?

✅ Configuration chaos : Are there 10 ways to set a flag?

✅ Number of unique binaries : How many configs are actually deployed?

✅ Age of the system : The older it gets, the more fragile it becomes (Hyrum’s Law strikes again)

TLDR: Complexity grows unless someone fights it. That “someone” is often you.

Why SREs Are Simplicity Champions

Systems evolve. They grow feature by feature, team by team. Complexity creeps in through retries, new dependencies, undocumented changes.

The result? A change in one service breaks another 10 steps downstream.

That’s where SREs come in. We don’t just support our systems, we understand the entire stack. We’re the connective tissue between services, teams, and environments.

Simplicity is everyone’s job. But SREs make it happen.

Case Study 1: When “Flexible” Becomes a Trap

A startup built core APIs using flexible key/value bags. Everything was “simple” : no structured contracts.

Result?

❌ Poor documentation

❌ Breaking changes in every release

❌ Compatibility nightmares

✅ Lesson learned: Structured data types (like Protobufs or Thrift) force thoughtful design and documentation early leading to simpler outcomes end-to-end.

Case Study 2: Rewriting Isn’t Always Simpler

Borg, Google’s internal container manager, grew complex. So the team began building Omega.A clean, principled replacement.

Reality check?

❌ Borg evolved faster than Omega

❌ Migration was near-impossible (thousands of services, millions of lines)

❌ Cost of dual-maintenance was too high

✅ What worked: Taking Omega’s ideas and feeding them back into Borg

✅ Bonus: Those same concepts helped launch Kubernetes

Don’t rewrite just to “start fresh.” Improve what you have. Make simplicity iterative.

Case Study 3: Taming the Display Ads Spiderweb

Ads SREs managed interconnected systems from DoubleClick, AdMob, AdSense, and more.

Problem:

Endless config permutations
Loops in query flows
Impossible-to-debug traffic paths

Solution:

✅ Unified standards

✅ One way to copy data, monitor, configure

✅ Gradual flag removal

✅ Consolidated servers

“System smell” is real. If you’re rewriting requests to pass through multiple engines, you have a design problem.

Case Study 4: Microservices at Scale Without Chaos

Google’s social SRE teams were overwhelmed by every team having its own stack.

They built a shared platform:

One set of CI/CD tools
Unified release + monitoring experience
Tiered SRE engagement (from light to deep)

✅ Services gained reliability

✅ Engineers switched teams easily

✅ No SRE bottleneck required

Standardization isn’t just cleaner, it makes scale manageable.

Case Study 5: pDNS Loops Back on Itself

Google’s production DNS (pDNS) depended on Svelte for lookup. But Svelte used pDNS. 😬

Cold-starting the system? Impossible.

Fix:

✅ Local IP list for Svelte

✅ Whitelisted service access

✅ Removed the circular dependency

Design like your system might go cold one day. Because it might.

Regaining Simplicity Is an Engineering Investment

Simplification usually means removing, not adding

🔁 Simplification often means replacing duplicate work with shared services

🏆 Celebrate it! Google literally gives “Zombie Code Slayer” badges for major code deletions

What You Can Do as an SRE

✅ Encourage system diagramming — before going on-call

✅ Review every design doc for complexity impact

✅ Track and reward simplification projects like feature launches

✅ Allocate 10% engineering time for simplicity work

✅ Create a rotating team with full-stack visibility

✅ Watch for:

Amplification: Error retries causing 10x RPCs
Cyclic dependencies: One cold start away from failure

TLDR

✅ Simplicity = reliability

✅ Complexity grows on its own. Simplicity requires effort.

✅ Rewrites aren’t always simpler. Improve what you’ve got.

✅ Celebrate code deletion as much as code creation.

✅ SREs must lead the push—no one else sees the system end-to-end

🎧 Want to Learn More?

Books

The Google SRE Book
Software Engineering at Google

Talks & Podcasts

Google Prodcast – Internal system design breakdowns
The Art of Software Simplicity – GOTO Conference talks

Tools That Help

Structurizr – Diagram-as-code for systems
SonarQube – Detect complexity in code
Protocol Buffers – Design once, scale forever

Credits

Based on Google SRE Book – Chapter 7: Simplicity Case studies adapted from Display Ads, Borg, Omega, and production DNS efforts.

📬 New drops every Monday, Wednesday, and Friday

👉 Subscribe now — No fluff, just field-tested DevOps wisdom

Chapter 6: Simplicity Made Simple

Poojitha A S

Senior DevOps/SRE | Kubernetes | AWS & Azure | CI/CD | Automation | 9+ Years | DevOps Content Creator on Linkedln

Why Simplicity Is a Superpower in SRE

Measuring Complexity: Easier Said Than Done

Why SREs Are Simplicity Champions

Case Study 1: When “Flexible” Becomes a Trap

Case Study 2: Rewriting Isn’t Always Simpler

Case Study 3: Taming the Display Ads Spiderweb

Case Study 4: Microservices at Scale Without Chaos

Case Study 5: pDNS Loops Back on Itself

Regaining Simplicity Is an Engineering Investment

What You Can Do as an SRE

TLDR

🎧 Want to Learn More?

Credits

DevOps Made Simple

1,657 follower

More articles by this author

Others also viewed

Taking Merlin Agent Builder to the next level

SLIs/SLOs Are Too Rigid

Our Investment in Gremlin - Leveraging Chaos to Create Resilient Systems

Does “No-Code” Really Mean No Technical Expertise?

An insight into the benefits of Kubernetes design patterns

MPY.log/entry-011: Legacy Is a Label, Not a Sentence

The Modern Developer’s Sequel: When "Just Strategy" Isn’t Enough

The next tool won’t fix your platform. Rethinking it might!

Lessons in Scalable Simplicity from the AK-47

TechBrain Review Newsletter 2024-09

Explore topics

Why Simplicity Is a Superpower in SRE

Measuring Complexity: Easier Said Than Done

Why SREs Are Simplicity Champions

Case Study 1: When “Flexible” Becomes a Trap

Case Study 2: Rewriting Isn’t Always Simpler

Case Study 3: Taming the Display Ads Spiderweb

Case Study 4: Microservices at Scale Without Chaos

Case Study 5: pDNS Loops Back on Itself

Regaining Simplicity Is an Engineering Investment

What You Can Do as an SRE

TLDR

🎧 Want to Learn More?

Credits

DevOps Made Simple

1,657 follower

Episode 25 :AI as a Service (AIaaS): How to Use AI Without Building AI

Aug 13, 2025

Episode 24 :AI in SRE: Friend, Not Foe

Aug 11, 2025

EPISODE 23: Inside Google’s Production Environment Through an SRE’s Eyes

Aug 8, 2025

Episode 22: Security vs Reliability :A System’s Secret Tug of War

Aug 6, 2025

DevOps Made Simple: Episode 21

Aug 4, 2025

Interview Edition 4 |Featuring Mercy Strickland | Founder of C3 Women Only

Aug 1, 2025

DevOps Made Simple : Episode 20 Ending the Relationship: When It’s Time to Move On

Jul 30, 2025

Devops Made Simple Episode 19:SRE: Reaching Beyond Your Walls

Jul 28, 2025

DevOps Made Simple Episode 19 : 65 Million Reasons to Learn Kubernetes And 2 Tools That Make It Easy

Jul 14, 2025

EPISODE: 18 Configuration Without Chaos: From YAML Hell to Jsonnet Zen

Jul 11, 2025

Others also viewed

Taking Merlin Agent Builder to the next level

SLIs/SLOs Are Too Rigid

Our Investment in Gremlin - Leveraging Chaos to Create Resilient Systems

Does “No-Code” Really Mean No Technical Expertise?

An insight into the benefits of Kubernetes design patterns

MPY.log/entry-011: Legacy Is a Label, Not a Sentence

The Modern Developer’s Sequel: When "Just Strategy" Isn’t Enough

The next tool won’t fix your platform. Rethinking it might!

Lessons in Scalable Simplicity from the AK-47

TechBrain Review Newsletter 2024-09

Explore topics