Why Systems Slow Down — and What Smart Caching Teaches Us About Scalability
Modern applications usually start fast. But as traffic grows, so does the load on the backend — and somewhere along the way, things slow down.
Often, it’s not bad code or poor DB design — it’s the volume of repeated reads hitting your database like a DDoS. Caching becomes the first (and sometimes only) line of defense.
But caching isn’t just about speed. It’s about trade-offs — consistency, durability, and failure recovery.
So… what exactly is a cache?
A cache is memory that stores frequently accessed data, so you don’t have to hit your database or expensive downstream systems every time.
But in real systems, a cache is not just a faster version of your database. It’s a separate layer that has its own lifecycle, consistency rules, and edge cases.
Let’s start with the typical, unoptimized request flow:
Repeat this for every user, every second, and your DB will cry for help.
Choosing the Right Cache Strategy
1. Local (In-Process) Cache
With local caching, each server instance stores data in its own memory. It’s blazingly fast — there are no network hops, just RAM access.
But this comes at a cost. Since every instance has its own copy of the cache, data updates don’t automatically sync across them. This can lead to inconsistencies.
In setups with multiple services or containers, this quickly turns into a fanout problem — if a user updates their profile, you’d have to tell every node to refresh their local cache.
To make this work reliably, you'd need sharding, coordination, and sometimes even replication logic — adding operational complexity.
2. Global (Centralized) Cache
This is where tools like Redis or Memcached shine. Instead of each node caching data independently, all instances talk to a shared in-memory store.
Now, if a value is updated, it’s immediately visible to all instances — solving the consistency problem.
The downside? Every cache access is a network call. Still fast, but not as instant as a local memory lookup. Plus, if Redis goes down and you haven’t set up clustering, your entire cache layer goes with it.
3. Distributed Cache with Sharding + Replication
This is what big distributed systems use — partitioning the cache across nodes (sharding), and replicating data across multiple machines for fault tolerance.
Here’s how reads and writes work in this setup:
To maintain consistency, you typically use quorum logic:
If total nodes = 3, and you write to 2 (W=2), then you must read from at least 2 (R=2) to be safe, because R + W > N.
This is what systems like Cassandra and DynamoDB follow under the hood.
Handling Writes: Where Things Start Getting Real
Reading from cache is easy. But when data changes — due to user updates or background jobs — your cache must reflect that. This is where most caching issues show up.
Write-Through Cache
In this approach, every write goes to both the cache and the database, synchronously.
This ensures data is always consistent between cache and DB. But since every write waits for the DB, it can get slow under high load.
It’s best used where correctness matters — user profiles, account data, payment flows.
Write-Back (Write-Behind) Cache
Here, the write is stored in cache and acknowledged immediately. The database is updated later, often asynchronously.
This gives lightning-fast writes — great for analytics or real-time counters — but if the cache dies before flushing, data is lost.
To make this safer, teams use write-ahead logs, persistent queues, or Redis with disk persistence (AOF).
Write-Around Cache
This one skips the cache entirely for writes. You write directly to the database, and cache only comes into play later during reads.
This keeps the cache lean, avoiding cold data. But every new value leads to a cache miss the first time.
It’s useful when writes are high and data is rarely reused immediately.
Cache-Aside (Lazy Loading)
Here, the application handles cache reads and writes explicitly.
On reads:
On writes:
It gives full control but requires discipline: if you forget to invalidate the cache after a write, users may see stale data.
Ensuring Consistency with Quorum Reads
In a distributed cache, data is spread across nodes. To ensure you're not reading stale values, you need quorum logic.
Rule of thumb: R + W > N R = number of nodes to read from W = number of nodes to write to N = total nodes in the system
If only one of the nodes got the latest value and you read from a different one, you’d get old data. Quorum ensures you hit at least one “up-to-date” node.
Cache Invalidation: The Real Headache
The hardest problem in caching isn’t writing — it’s knowing when not to read.
What happens when the DB has new data but the cache still serves old values?
Here’s how teams usually deal with invalidation:
Eviction Strategies: When Memory Runs Out
Your cache can't hold everything forever. So when space is tight, something gets evicted.
Common strategies:
A good eviction strategy depends on your access patterns — not just the size of your cache.
Before You Cache Anything...
Ask these:
Final Thoughts
Caching is not just a performance trick — it’s a system design decision.
Done right, it cuts infra costs, improves response time, and keeps your DB breathing. Done wrong, it introduces bugs you won’t notice until users start reporting stale prices, missing updates, or worse.
So don’t just "add Redis" and hope for the best. Design for failure, plan for eviction, and always respect the complexity of your cache.
#SystemDesign #Caching #Redis #BackendEngineering #Scalability #PostgreSQL #SoftwareArchitecture #Microservices
Mobile and Front-end developer | Author of react-native-gifted-charts
2wLoved the way you neatly explained these crucial concepts in simple terms. Thanks for sharing 👏
Solution Architect | Flutter | React | React Native | Node | CodeWithMe | Learning System Design | Let's connect if you want to learn Mobile Development
1moNice post! Given that caching improves read performance but introduces potential consistency issues, how would you design a system where both low-latency reads and strong write consistency are critical—say, in a real-time bidding platform or collaborative editing tool? Which caching strategies (eviction + write policy) would you combine, and why?
Java Full Stack Developer @ Integra | Agentic AI | Java | Spring Boot | Microservices | Kafka | Docker | CI/CD | Banking and Finance Solutions
1moGreat post, Just curious what would happen if the cache goes down due to some reason? Would all the traffic then hit the database directly, and how do you usually handle that scenario ?
Java, Spring Boot, SQL, AWS, Jenkins, JUnit, OpenShift | Data Structures
1moNice share Pushkar.
Problem Solver | Software Engineer | Fullstack Dev
1moLoved the way you have covered all the cases with their trade offs. Thanks for sharing this