Paperwork. The Tax You Pay for Bad Engineering

Dale Frohman

Director SRE / Executive leader having fun with SRE, DevOps and TechOps

Published Feb 26, 2025

It was a normal Tuesday, until it wasn’t.

At exactly 2:37 p.m., the first alert fired. Service latency spiked. A minute later, another one. Then another.

By 2:42 p.m., production was officially on fire. Dashboards turned red and engineers scrambled to contain the chaos.

After an hour of frantic debugging (involving way too many Slack threads and one very desperate command), service was finally restored. Customers stopped screaming, execs took a deep breath, and for a brief moment, everyone thought the nightmare was over.

Then came the real horror.

The paperwork.

The post-incident review. The Root Cause Analysis (RCA). The Five Whys (or more accurately, the Fifty Slacks). The cross-team debriefs, the corrective action plans, the documentation updates, the endless meetings.

And suddenly, the best engineer on the team, the one who actually fixed the issue, was buried in so much paperwork that they couldn’t work on anything meaningful for weeks.

Two months later, that engineer left for another company.

Coincidence? Probably not.

Paperwork is Death by a Thousand Paper Cuts

Jeff Bezos has a concept called paper cuts vs. big problems.

He argues that companies that obsess over tiny annoyances, things that don’t actually move the needle, end up losing sight of what really matters. Instead of focusing on big, strategic work, they spend all their time on bureaucratic nonsense.

And in engineering, paperwork is the ultimate paper cut.

Think about it:

An outage happens.
Teams spend weeks dissecting every detail instead of fixing systemic problems.
The engineers who actually did the work are punished with more meetings, more documentation, more process.

And then leadership wonders why their best people burn out and leave.

Here’s the thing, we don’t need less accountability. We need fewer preventable failures in the first place.

Because the best way to avoid paperwork isn’t to cut corners.

It’s to build a system that doesn’t break in the first place.

Paper is the Ultimate Motivator (Because No One Wants to Do It)

Paper has a way of making people really good at avoiding things.

Doctors and nurses follow detailed procedures because paperwork for a medical error is a career-ending nightmare.
Speeding tickets exist because nobody wants to explain to their spouse why they blew $300 on an “accidental” highway sprint.
Expense reports? If your manager made you submit receipts in triplicate, you’d never expense another overpriced airport sandwich again.

In tech, paperwork exists for one reason: to make sure we learn from failures.

But what if we flipped the script? Instead of using paper as a punishment, what if we used it as a reason to get proactive?

How to Stop Death by Paperwork (and Actually Improve Reliability)

If you don’t want to spend your life filling out incident reports, start doing these three things today:

1 - Stop Letting Your Data Rot. Build an Observability Data Lake

Most companies have logs, metrics, and traces scattered across different tools. But if your data is trapped in silos, you’re flying blind.

What to do instead:

Aggregate all telemetry data into a centralized observability data lake.
Use ML-powered anomaly detection to spot problems before they turn into outages.
Correlate logs, traces, and metrics to understand the full impact of an issue in seconds, not hours.

Example: Instead of waiting for a database crash, detect slow query patterns early and trigger an automated optimization before performance degrades.

2 - Make Incidents Smarter. Enrich Alerts with Context

The worst kind of alert? One that says “Service Unavailable” with zero useful details.

Engineers shouldn’t have to dig through five different dashboards just to understand what went wrong.

What to do instead:

Use context-aware alerting. When an incident fires, attach related logs, traces, and recent deployments.
Automate post-mortem tagging, so every alert includes links to similar past incidents.
Integrate with Slack, Jira, and runbooks so engineers can take action instantly.

Example: Instead of getting a vague “High CPU Usage” alert, your on-call engineer gets a Slack message that says:

"CPU usage on is at 95%. Last deployment: 15 minutes ago. Related logs indicate an increase in garbage collection time. Here’s the rollback command: ”

Now, instead of wasting an hour diagnosing the issue, they fix it in seconds.

3 - Restore Service Automatically. Use Auto-Remediation

Why wait for an engineer to manually react when your system can self-heal?

What to do instead:

Set up automated rollback mechanisms that trigger when a bad deployment is detected.
Use self-healing Kubernetes clusters that restart failing pods automatically.
Build automated fail-over strategies so services reroute traffic before customers even notice a problem.

Example: Instead of waking up a human at 3 a.m. for a memory leak, the system automatically kills the offending process and spins up a fresh instance.

No alert. No human intervention. No paperwork.

Focus on Big Problems, Not Paper Cuts

Here’s the reality:

Nobody joins an engineering team to fill out incident reports.
Nobody wants to spend half their week in post-mortem meetings.
Nobody enjoys explaining to leadership why their entire system went down because someone fat-fingered a config change.

And yet, most engineering teams are drowning in reactive work instead of actually making their systems better.

If you want to build a team that’s excited to come to work, eliminate the paper cuts.

Invest in:

Proactive observability

Automated incident response

Self-healing infrastructure

Because the best engineers don’t want to spend their time on process, forms, and endless meetings.

They want to build cool things, solve big problems, and make systems that don’t break in the first place.

Let’s give them that.

Paperwork. The Tax You Pay for Bad Engineering

Dale Frohman

Director SRE / Executive leader having fun with SRE, DevOps and TechOps

Paperwork is Death by a Thousand Paper Cuts

Paper is the Ultimate Motivator (Because No One Wants to Do It)

How to Stop Death by Paperwork (and Actually Improve Reliability)

1 - Stop Letting Your Data Rot. Build an Observability Data Lake

2 - Make Incidents Smarter. Enrich Alerts with Context

3 - Restore Service Automatically. Use Auto-Remediation

Focus on Big Problems, Not Paper Cuts

More articles by this author

Others also viewed

Error Budgets Aren’t Dead

My Learnings from a Turnaround

A Brilliant Product with Broken Support? Still a Broken Experience.

6 Ideas to Make Your Processes More Resilient

Perception vs. Perspective

How I Get $%!# Done

Repeatable Operating System

The Rise of the Systems Champion: A New Essential Business Role

2024 Reflections: What It Takes to Build from Zero to Zero

Implement Anything: A General Theory of Implementation (Part 1)

Explore topics

Paperwork is Death by a Thousand Paper Cuts

Paper is the Ultimate Motivator (Because No One Wants to Do It)

How to Stop Death by Paperwork (and Actually Improve Reliability)

1 - Stop Letting Your Data Rot. Build an Observability Data Lake

2 - Make Incidents Smarter. Enrich Alerts with Context

3 - Restore Service Automatically. Use Auto-Remediation

Focus on Big Problems, Not Paper Cuts

What F1 Drivers (and a £1 Miracle) Can Teach Tech Leaders About Leading Under Pressure

Aug 11, 2025

What EV Batteries Taught Me About Leadership

Jul 30, 2025

What a 1970s Nuclear Dream Teaches Us About AI, Tech Debt, and Getting Stuff Done

Jul 22, 2025

Cloud Was a Demolition Job, AI Is IKEA

Jul 16, 2025

The 5 Whys Might Not Be Enough: Why It’s Time to Think About Loss of Control, Not Just Failure

Jul 8, 2025

What the Savannah Bananas Can Teach Tech Leaders About Breaking the Game

Jul 2, 2025

Who Are You Calling From a Third-World Prison? (And What That Says About Your Team)

Jun 17, 2025

Climbing the Second Mountain (Without Looking Like a Social Media Monk)

May 30, 2025

Cognitive Load Is Crushing Us! Even Avocados Are Confused

May 20, 2025

Humans Are APIs: Build Connections, Not Chaos

May 6, 2025

Others also viewed

Error Budgets Aren’t Dead

My Learnings from a Turnaround

A Brilliant Product with Broken Support? Still a Broken Experience.

6 Ideas to Make Your Processes More Resilient

Perception vs. Perspective

How I Get $%!# Done

Repeatable Operating System

The Rise of the Systems Champion: A New Essential Business Role

2024 Reflections: What It Takes to Build from Zero to Zero

Implement Anything: A General Theory of Implementation (Part 1)

Explore topics