Building Systems That Last

Building Systems That Last

In every system, scale eventually stops being a goal — and starts becoming a constraint. Complexity, growth, unpredictability, and edge cases test what begins as a clean architecture.


🔹 Clarity Before Complexity

The most complex problems aren't always technical — they hide in ambiguity: unclear goals, evolving requirements, and undefined success.

Before diving into tuning or scaling, I ask:

  • What exactly are we trying to improve?
  • Do we have metrics that reflect it?
  • What matters most — latency, cost, resilience, or dev velocity?

With that clarity, the tech work becomes far easier. Whether debugging queue backlogs or redesigning a scheduler, the goal stays the same: reduce uncertainty, define the levers, and make measurable progress.

Without that step? We risk shipping clever systems that solve the wrong problem.


🔹 Predictable Beats Fast

Speed is seductive. We chase lower latency, faster endpoints, and leaner workloads.

But the fastest systems often fail under pressure. The most predictable ones survive.

I've learned to value:

  • Tail latency over averages
  • Consistency over peak performance
  • Behavior under pressure, not just at rest

Fast systems break, while predictable systems bend. If I must choose, I choose predictable — it gives you trust, control, and space to recover.


🔹 Design for the 99th Percentile

It's easy to celebrate average performance. But real issues live in the tail.

That 1% of requests?

  • Hit retries
  • Break SLAs
  • Expose system limits

Users don't care how fast you are 90% of the time — if the other 10% results in failure.

You're not seeing the whole picture if you're not looking at the 99th percentile.


🔹 Queue Depth Tells the Truth

I watched CPU and memory graphs for a long time to gauge health.

But one metric told me what matters: queue depth.

It revealed:

  • Silent backlog buildup
  • Coordination delays
  • Processing lag under pressure

Queue depth tells the story that resource graphs miss.

It's where performance meets user experience.


🔹 Correct Doesn't Mean Scalable

I've worked on functionally correct systems. They passed tests, met specs, and worked in staging.

Then came the real-world scale:

  • Locks contended
  • Retry loops cascaded
  • Tiny latencies added up to massive backlogs

The system didn't go wrong — it was just unscalable.

Correctness is a starting point. Scalability is a journey.


Final Thoughts

Scaling isn't just about speed.

It's about building systems that are:

  • Transparent
  • Predictable
  • Fair
  • Survivable

Build for stress, observe the right signals, don't over-isolate what you can reuse, and never trade predictability for performance.

If you can do that — you're not just scaling. You're building systems that last.


#SystemDesign #Scalability #DistributedSystems #TechStrategy #BackendEngineering


Ankush Maheshwari

Principal Software Engineer at Atlassian

3mo

Fast systems break, while predictable systems bend Why can’t a system both be fast and predictable? We often tend to dichotomise certains aspects which can often live together. I have seen so many leaders often speaking in terms of : “move fast and break things”: “Embrace the chaos. It means you’re doing something meaningful..” I have seen companies that move fast without compromising quality in any way.. This is how a progress is made… you look beyond dichotomies and accept their co-existence How about I say “move fast and do the right thing” and lets build a fast and predictable system..

Aman Singh

Bachelor @ SIET Prayagraj

3mo

Thanks for sharing

Like
Reply

To view or add a comment, sign in

Others also viewed

Explore topics