Why Systems Slow Down — and What Smart Caching Teaches Us About Scalability

Pushkar kumar

Solution Architect (Backend-Focused) | Node.js, NestJS, GraphQL, Hasura, PostgreSQL | AWS, Docker, K8s, CI/CD | React, React Native, Flutter | Blockchain

Published Jul 16, 2025

Modern applications usually start fast. But as traffic grows, so does the load on the backend — and somewhere along the way, things slow down.

Often, it’s not bad code or poor DB design — it’s the volume of repeated reads hitting your database like a DDoS. Caching becomes the first (and sometimes only) line of defense.

But caching isn’t just about speed. It’s about trade-offs — consistency, durability, and failure recovery.

So… what exactly is a cache?

A cache is memory that stores frequently accessed data, so you don’t have to hit your database or expensive downstream systems every time.

But in real systems, a cache is not just a faster version of your database. It’s a separate layer that has its own lifecycle, consistency rules, and edge cases.

Let’s start with the typical, unoptimized request flow:

Repeat this for every user, every second, and your DB will cry for help.

Choosing the Right Cache Strategy

1. Local (In-Process) Cache

With local caching, each server instance stores data in its own memory. It’s blazingly fast — there are no network hops, just RAM access.

But this comes at a cost. Since every instance has its own copy of the cache, data updates don’t automatically sync across them. This can lead to inconsistencies.

In setups with multiple services or containers, this quickly turns into a fanout problem — if a user updates their profile, you’d have to tell every node to refresh their local cache.

To make this work reliably, you'd need sharding, coordination, and sometimes even replication logic — adding operational complexity.

2. Global (Centralized) Cache

This is where tools like Redis or Memcached shine. Instead of each node caching data independently, all instances talk to a shared in-memory store.

Now, if a value is updated, it’s immediately visible to all instances — solving the consistency problem.

The downside? Every cache access is a network call. Still fast, but not as instant as a local memory lookup. Plus, if Redis goes down and you haven’t set up clustering, your entire cache layer goes with it.

3. Distributed Cache with Sharding + Replication

This is what big distributed systems use — partitioning the cache across nodes (sharding), and replicating data across multiple machines for fault tolerance.

Here’s how reads and writes work in this setup:

To maintain consistency, you typically use quorum logic:

If total nodes = 3, and you write to 2 (W=2), then you must read from at least 2 (R=2) to be safe, because R + W > N.

This is what systems like Cassandra and DynamoDB follow under the hood.

Handling Writes: Where Things Start Getting Real

Reading from cache is easy. But when data changes — due to user updates or background jobs — your cache must reflect that. This is where most caching issues show up.

Write-Through Cache

In this approach, every write goes to both the cache and the database, synchronously.

This ensures data is always consistent between cache and DB. But since every write waits for the DB, it can get slow under high load.

It’s best used where correctness matters — user profiles, account data, payment flows.

Write-Back (Write-Behind) Cache

Here, the write is stored in cache and acknowledged immediately. The database is updated later, often asynchronously.

This gives lightning-fast writes — great for analytics or real-time counters — but if the cache dies before flushing, data is lost.

To make this safer, teams use write-ahead logs, persistent queues, or Redis with disk persistence (AOF).

Write-Around Cache

This one skips the cache entirely for writes. You write directly to the database, and cache only comes into play later during reads.

This keeps the cache lean, avoiding cold data. But every new value leads to a cache miss the first time.

It’s useful when writes are high and data is rarely reused immediately.

Cache-Aside (Lazy Loading)

Here, the application handles cache reads and writes explicitly.

On reads:

Try cache → if miss → fetch from DB → populate cache

On writes:

Update the DB → then invalidate or update the cache

It gives full control but requires discipline: if you forget to invalidate the cache after a write, users may see stale data.

Ensuring Consistency with Quorum Reads

In a distributed cache, data is spread across nodes. To ensure you're not reading stale values, you need quorum logic.

Rule of thumb: R + W > N R = number of nodes to read from W = number of nodes to write to N = total nodes in the system

If only one of the nodes got the latest value and you read from a different one, you’d get old data. Quorum ensures you hit at least one “up-to-date” node.

Cache Invalidation: The Real Headache

The hardest problem in caching isn’t writing — it’s knowing when not to read.

What happens when the DB has new data but the cache still serves old values?

Here’s how teams usually deal with invalidation:

TTL (Time to Live): Keys expire automatically, but not instantly.
Manual Invalidation: After a write, send a DEL key command to clear the cache.
Pub/Sub: On update, publish an invalidation event so other instances clear their cache too.
Versioned Keys: Every change updates the key suffix (e.g., user:123:v2), so reads are forced to fetch the latest version.

Eviction Strategies: When Memory Runs Out

Your cache can't hold everything forever. So when space is tight, something gets evicted.

Common strategies:

LRU (Least Recently Used): Kick out the item that hasn’t been accessed in the longest time.
LFU (Least Frequently Used): Remove the item used least often.
Segmented LRU (like in Memcached): Promote frequently used items into a hot region, demote infrequent ones into a cold region.

A good eviction strategy depends on your access patterns — not just the size of your cache.

Before You Cache Anything...

Ask these:

Is the data mostly read-heavy or write-heavy?
Can your system tolerate eventual consistency?
Do you need every write to be persistent immediately?
What’s your cache miss strategy?
How do you plan to handle invalidation?
What’s your eviction plan under memory pressure?

Final Thoughts

Caching is not just a performance trick — it’s a system design decision.

Done right, it cuts infra costs, improves response time, and keeps your DB breathing. Done wrong, it introduces bugs you won’t notice until users start reporting stale prices, missing updates, or worse.

So don’t just "add Redis" and hope for the best. Design for failure, plan for eviction, and always respect the complexity of your cache.

#SystemDesign #Caching #Redis #BackendEngineering #Scalability #PostgreSQL #SoftwareArchitecture #Microservices

Abhinandan Kushwaha

Mobile and Front-end developer | Author of react-native-gifted-charts

Loved the way you neatly explained these crucial concepts in simple terms. Thanks for sharing 👏

1 Reaction

Ruchika Gupta

1mo

Nice post! Given that caching improves read performance but introduces potential consistency issues, how would you design a system where both low-latency reads and strong write consistency are critical—say, in a real-time bidding platform or collaborative editing tool? Which caching strategies (eviction + write policy) would you combine, and why?

3 Reactions

Niraj Kushwaha

1mo

Great post, Just curious what would happen if the cache goes down due to some reason? Would all the traffic then hit the database directly, and how do you usually handle that scenario ?

Akash Agrawal

Java, Spring Boot, SQL, AWS, Jenkins, JUnit, OpenShift | Data Structures

1mo

Nice share Pushkar.

1 Reaction

Shivam Pundir

Problem Solver | Software Engineer | Fullstack Dev

1mo

Loved the way you have covered all the cases with their trade offs. Thanks for sharing this

Why Systems Slow Down — and What Smart Caching Teaches Us About Scalability

Pushkar kumar

Solution Architect (Backend-Focused) | Node.js, NestJS, GraphQL, Hasura, PostgreSQL | AWS, Docker, K8s, CI/CD | React, React Native, Flutter | Blockchain

So… what exactly is a cache?

Choosing the Right Cache Strategy

1. Local (In-Process) Cache

2. Global (Centralized) Cache

3. Distributed Cache with Sharding + Replication

Handling Writes: Where Things Start Getting Real

Write-Through Cache

Write-Back (Write-Behind) Cache

Write-Around Cache

Cache-Aside (Lazy Loading)

Ensuring Consistency with Quorum Reads

Cache Invalidation: The Real Headache

Eviction Strategies: When Memory Runs Out

Common strategies:

Before You Cache Anything...

Final Thoughts

More articles by this author

Others also viewed

90% of systems fail to scale… due to one overlooked detail: caching.

5 Ways to Optimize API Performance with DreamFactory

Mastering Distributed Cache: A Blueprint for Scalability, Performance, and Availability

🚀 Scaling Systems: Why Your App is Slow — And How Distributed Caching Can Fix It

Unveiling the Power of Caching: Redis Revolutionizes Data Performance

Understanding Redis Master-Replica with Sentinel: A Comprehensive Guide

🚀 Supercharging Your Applications: The Magic of Caching! 🚀

MemcacheD vs Redis vs Aerospike — Which One to Choose for Your Application Design?

Low-Level Design of Caching: TTL & LRU Strategies with Node.js

Boosting Application Performance with Smart Caching Strategies

Explore topics

So… what exactly is a cache?

Choosing the Right Cache Strategy

1. Local (In-Process) Cache

2. Global (Centralized) Cache

3. Distributed Cache with Sharding + Replication

Handling Writes: Where Things Start Getting Real

Write-Through Cache

Write-Back (Write-Behind) Cache

Write-Around Cache

Cache-Aside (Lazy Loading)

Ensuring Consistency with Quorum Reads

Cache Invalidation: The Real Headache

Eviction Strategies: When Memory Runs Out

Common strategies:

Before You Cache Anything...

Final Thoughts

Encoding vs Hashing vs Encryption: Clearing the Confusion

Aug 1, 2025

PostgreSQL – The Most Complete Database I’ve Worked With

Jun 20, 2025

Others also viewed

90% of systems fail to scale… due to one overlooked detail: caching.

5 Ways to Optimize API Performance with DreamFactory

Mastering Distributed Cache: A Blueprint for Scalability, Performance, and Availability

🚀 Scaling Systems: Why Your App is Slow — And How Distributed Caching Can Fix It

Unveiling the Power of Caching: Redis Revolutionizes Data Performance

Understanding Redis Master-Replica with Sentinel: A Comprehensive Guide

🚀 Supercharging Your Applications: The Magic of Caching! 🚀

MemcacheD vs Redis vs Aerospike — Which One to Choose for Your Application Design?

Low-Level Design of Caching: TTL & LRU Strategies with Node.js

Boosting Application Performance with Smart Caching Strategies

Explore topics