Consistent Hashing Explained: The Algorithm That Doesn't Freak Out When You Scale

Archit Agarwal

PMTS @ Oracle | Golang & Concurrency Specialist | System Design Architect & Evangelist | Backend Performance Optimizer | Tech Influencer & Career Mentor | Docker | Kubernetes | AWS Certified

Published Jun 25, 2025

Have you ever stopped to wonder how Redis or Memcached — two of the most popular distributed caching systems — actually decide where your data goes?

Like… if you store a key, how does the system know exactly which server to stash it on? And when you need that value back, how does it fetch it from the same machine without broadcasting to all of them?

And here’s the real kicker: what happens when you add or remove a server? Does the whole thing just break? (Spoiler: yes, unless you’re smart about it.)

These were the questions that hijacked my brain one afternoon. I thought, “Eh, how hard can it be?”

Turns out… pretty hard. I had all sorts of ideas that felt right, but were hilariously wrong. But hey — I was curious, and that’s all it took to fall head-first into a rabbit hole of hash functions, server rings, and something magical called consistent hashing.

In this article, I’ll walk you through that journey — from the naive approaches I once believed in, to the elegant, battle-tested solutions that real systems use.

If you’ve ever thought, “Isn’t key % N enough?" —this one's for you. 😄

🛠️ Let’s Build a Cache System… and Break It Immediately

So, as you already know, one fine day I thought 💭 — “How hard can it be to build a distributed cache?”

I mean, it’s just key-value storage, right? I’d built TODO apps that do full-blown CRUD on an entity. This should be… easier?

Spoiler: it wasn’t. 😢

Building a distributed cache is less like coding a TODO app and more like raising a toddler with trust issues during a network outage. (And yes, I can already hear you whispering, “This guy’s being dramatic.” Just hang tight.)

Let me show you the first idea I tried. And by “tried,” I mean “hoped would work until it broke in my face.”

✅ Step 1: Use a Single Server to Store Everything

💡 Pros:

Super simple to implement

Easy to deploy

No distributed anything = no distributed chaos

Seems like a clean MVP, right?

But here’s where the devil steps in 👺

❌ Cons:

It’s a single point of failure (one crash and poof — goodbye cache)

Under load, it becomes painfully slow

And of course… you can’t scale

To be fair, I wasn’t expecting this to scale to infinity. I just wanted to start small and “grow it later” (classic trap). So I thought — let’s go bigger. Let’s move to a cluster of 3 servers and figure out how to distribute the keys across them.

Sounds cool, right?

Now came the million-dollar question: “How do I decide which server stores what key?”

And here’s where I gift you a front-row seat to my glorious facepalm moment.

😩 Attempt 1: Random Assignment (a.k.a. Inky Pinky Ponky Load Balancing)

Yes, I genuinely thought this would work: “Just randomly assign the key to any server!”

💡 Pros:

Fast to save keys

Stupidly easy to implement

Low chance of one server being overloaded

I felt smart. For a moment.

Then came the reality check when I tried to retrieve or update a key.

❌ Cons:

You now have to ask every server, “Hey, do you have this key?”

Retrieval becomes slower than your boss replying to a leave request

Read operations? Load still hits all servers

At this point, I know you’re judging me — and honestly, fair. But hey, at least I was trying to improve, unlike certain managers we know who never admit they don’t understand half the tech in sprint planning 😂

By now, I was convinced there had to be a better way. I mean… surely someone has solved this problem already, right?

And then it hit me.

Wait. What if I use a hash function? Like… just take the key, hash it, and then do key % N to find which server it goes to?

Sounds elegant. Almost too elegant.

And I tried this one too. 😂 And what followed was a performance horror story I’d rather not relive. But for your sake, I will.

💣 I Tried key % N and Accidentally Sabotaged My Cache

So here’s where I thought I was being clever.

“Let’s just take the key, do key % N, and boom—distributed caching solved.”

Modulo felt like the hero I deserved. It gave me a quick way to distribute keys across my 3-server cluster.

And it actually worked… until I tried scaling. 😅 Let me walk you through this tiny disaster.

🔢 Imagine We Have 10 Keys: 1 to 10

We’re starting simple, assigning each key to a server using key % 3. That means:

(You can imagine this in a nice little table or diagram — go ahead, add an image right here for dramatic effect.)

So far, so good. When a key comes in, we just do the same key % 3 operation and boom—we know exactly where to find it.

Life is beautiful.

…Until you need to scale.

🚨 Now Let’s Add a 4th Server

Let’s say traffic increases or you just got that sweet AWS credit and you decide to add a 4th server to the cluster.

Now your hashing logic becomes key % 4. Let’s look at what happens:

💥 Boom. Most keys have now changed servers.

You’re not just rebalancing a few keys… you’ve practically shuffled your entire dataset. The cache hit rate? Dead. The retrieval logic? Panicking. Your SRE? Probably rage-quitting.

😓 So What Did I Learn?

This was the moment I realized: Modulo is not scalable. It’s simple, sure. But the moment your cluster changes size (which it will), everything breaks loose.

So, naturally, I did what we all do in 100% of situations like this… I Googled the hell out of it. 🧠💻

And that’s when I stumbled into the world of hash functions. Like — real ones. With actual properties. Not just the cute % operator I was misusing.

Before we move ahead to the magical world of consistent hashing, I need to take you on a quick detour into something actually useful…

🧠 Hash Functions: What They Are and Why They Secretly Hate You

So after breaking my cache with key % N, I decided to get serious.

Enter: Hash Functions.

Now I’d heard of them before — mostly while copying login logic from Stack Overflow — but I never gave them the respect they deserved.

Let’s fix that.

🔍 What’s a Hash Function, Really?

At its core, a hash function takes a key and gives you back a number — the hash value. Think of it like a vending machine: you insert your key (like "user42") and out pops a number from a predefined range (say 0 to 2³² - 1).

But not all vending machines are created equal.

A good hash function should have some superpowers:

🦸♂️ Properties of a Good Hash Function

🔁 Deterministic Same input → always gives the same output. (Otherwise, good luck finding your data again.)
🎯 Uniform Distribution It should spread keys evenly across the hash space. Imagine all your keys piling onto one server — yeah, that’s what bad distribution does.
⚡ Fast and Efficient You don’t want to spend 10ms just to hash a key. Hashing should feel instant.
💥 Collision Resistance While we’re not doing cryptography here, it’s still important that two different keys rarely give the same hash value. Otherwise… boom — two people fighting over the same cache slot.

🤔 Where Else Have You Seen Hashing?

Ah yes, the good old password storage story.

Back in the day (aka yesterday for some legacy codebases), people used to store passwords as plain text in the database. Today, we hash them. Why?

Because even if someone breaks into your DB, they only get the hashed value, not the real password.

If your password was "ilovego123" (don't), and the attacker finds its hash as 0xFADE123, they still have to reverse-engineer it — which is non-trivial if you use a secure hash like SHA256 or bcrypt (with salt).

🔒 Hashing = one-way street. Once it’s hashed, you can’t easily go back.

🔗 So What’s the Link to Distributed Caching?

In a caching system, we use hash functions to map keys (like "userID:1001") to servers.

The better your hash function, the more evenly the load is spread.

But here’s the kicker…

Even the world’s best hash function can’t help if your distribution logic is flawed — like key % N.

That’s where Consistent Hashing comes in. And my inner dev was finally ready for it.

🌀 Enter the Ring: Where Keys and Servers Live in Harmony

By this point in my rabbit hole, something clicked. I felt like I’d just unlocked a boss level in system design — the kind that makes you pause your code editor and say: “Wait… this actually makes sense!”

Don’t trust me yet. Read this and I bet you’ll be whispering “damn, that’s clever” by the time we’re done. (And just wait till you hit the next two sections — they’ll blow your mind 🤯).

🎯 So, What Is This Magical Thing?

Let’s imagine a virtual ring that has a huge number of slots — say 2³². Why 2³² ? Because we want to map both our servers and keys into this giant circular hash space. Basically, think of it as a pizza. But instead of toppings, we’re placing servers and keys.

Now pick your favorite hash function — MD5 or SHA1 works fine.

Here’s what we do:

Hash the server identifier (IP address, domain name, whatever)
Hash the keys you want to store
Mod the result with the ring size (e.g., 1024 in our demo — please check if you want to use a bigger realistic size like 2³²)
That gives you a position on the ring

🔁 And Then the Magic Starts

Here’s how consistent hashing distributes data:

Each server is placed at a point on the ring based on its hash
Each key is also placed on the ring
To find where a key goes, we move clockwise around the ring until we find the first server
That server is responsible for the key

Example time: Let’s bring back our old setup: 3 nodes, keys 1–10. Say our servers are named node1, node2, and node3.

You:

Hash node1, node2, node3 → place them on the ring
Hash keys 1 through 10 → place them too
Now you have a visual where keys fall into “zones” owned by each server

➕ Adding a New Server (Without Melting Everything)

Now, let’s say we introduce a 4th server — node4.

It gets placed between, say, node1 and node2.

So what happens?

Only the keys that would’ve previously landed on node2, but now fall before node4, get reassigned to node4. That’s it.

No dramatic reshuffling. No mass eviction of cache entries. Just a clean little rotation.

Consistent hash ring having a new node 4 added

➖ What If a Server Dies?

Same deal — super graceful.

If node2 disappears, all the keys it owned now go to the next server clockwise, say node0.

Only those keys move. Your other servers stay completely untouched. It’s like… self-healing cache feng shui.

Consistent hash ring with a server deleted

🤯 Magical, Right?

Seriously, when I saw this in action, I felt like I’d discovered dev sorcery. Minimal rebalancing, clean lookups, and servers that don’t freak out every time the cluster changes?

Chef’s kiss. 👨🍳💻

But wait — I know what you’re thinking.

“This looks neat, but what if the servers aren’t evenly spaced around the ring? Couldn’t one server end up responsible for a giant chunk of keys and become a ‘hot node’?”

Exactly the same doubt I had.

And yes, this imbalance can lead to overload or even failure if one node ends up doing way more work than the others.

But guess what? The designers of this algorithm already saw that coming. And they gave us an elegant fix…

🎭 Virtual Nodes: Or How I Cloned My Servers Without Getting Yelled At

Turns out, the secret to load balancing isn’t more logic — it’s more identities.

Yup, servers having an identity crisis. Multiple ones, in fact.

Here’s the idea: instead of placing each physical server at just one location on the hash ring, we give it multiple virtual positions. These are called virtual nodes (or vnodes, if you want to sound cool in interviews).

🤔 Why Do We Need Virtual Nodes?

Let’s say we have 3 servers, and we assign 3 virtual nodes per server. That gives us 9 points on the ring instead of just 3 — meaning:

Less chance of any one server getting overloaded
Better spread of keys across the hash space
Fewer “hot spots” caused by uneven key distribution
Smoother scaling and failure recovery

So now, our algorithm thinks it has 9 servers. But behind the scenes, it’s just the same 3 physical nodes wearing different costumes.

It’s like the servers are saying,

“Hey, I’m node1… but I’m also node1#2 and node1#3 — nice to meet you 👋”

❓ Obvious Question: How Do We Assign These Virtual Positions?

Great question.

This is where the algorithm gets slick. And yes — this part blew my mind too.

You don’t need 3 different hash functions (although you can). You just take your server identifier — like "node1" — and make it unique per vnode:

"node1#1"
"node1#2"
"node1#3"

Then, pass each of these through your hash function (say SHA1), take mod ringSize, and boom — you now have 3 positions on the ring for the same server.

For keys, you still use one hash function — just like before.

Elegant, right? The algorithm stays simple, but gains so much flexibility.

🧠 And What Happens If a Server Goes Down?

Ah, you’re thinking ahead — I like that. You’re becoming dangerously close to a system designer now. 😎

So let’s say node2 dies.

Instead of chaos, only the vnodes of node2 disappear from the ring. The keys that were assigned to them will now go to the next vnode in the clockwise direction — just like before.

No full redistribution. No rehashing every key. Just a graceful shift of responsibility.

That’s the real power of consistent hashing with virtual nodes. It’s resilient, scalable, and clean.

And if you’re already thinking about fault tolerance, replication, or backups — 👊 Respect. You’re thinking like a real backend engineer now.

💥 My Server Crashed. But It’s Fine, I Have Backups. Right?

Here’s the thing… consistent hashing isn’t just about scaling.

It’s also about not freaking out when a server suddenly ghosts you like that one friend who says “on my way” and never shows up.

So what do you do when a server goes offline and takes your precious data with it?

You already know the answer. The age-old solution: replication.

🔁 Let’s Talk Replication

Just like in most distributed systems, we don’t store data on just one server. We save it on multiple — usually the next N-1 servers in the clockwise direction on the ring.

So if your replication factor is 3, every key lives on 3 different nodes.

This means if one node dies, the next node has your back. And if that one dies too? You still have one more backup to keep things afloat.

It’s like data insurance. Because in distributed systems, failure isn’t an “if” — it’s a “when.”

😎 Real Talk: Resilience > Perfection

The beauty of this model is that failure becomes boring.

A node crashes? No drama.

Keys automatically route to the next available replicas. Clients don’t even notice the outage.

That’s the kind of engineering confidence we all crave.

🤔 Quick Question For You

Let’s say we add a new node to the ring…

How many nodes should get impacted? Take a second and think about it. (No pressure. Unless you’re in an interview, then… yeah, pressure.)

🧊 Now I Can Add Servers Without Crying

Remember how adding a server used to feel like defusing a bomb?

You’d plug in a new node, and suddenly half your keys would decide they belong elsewhere. You’d lose cache hits. Users would stare at loading spinners. Your infra team would stare at you.

But not anymore.

Thanks to consistent hashing (and a few virtual clones), adding a new server is now… boring. Like, click-a-button-and-go-make-coffee boring.

Here’s what happens:

The new server gets a few virtual nodes assigned on the ring
Only the keys that fall between the previous vnode and the new vnode are reassigned
Everything else stays exactly where it is
Your cache hit rate stays high
You keep your job ✅

This is the kind of quiet engineering that wins battles before they even start.

So yeah — I can finally scale without needing a therapy session afterward.

🧠 Let’s Recap Before You Fall Into a Hash Hole Like I Did

Okay, let’s zoom out before your brain turns into ring-shaped spaghetti.

Here’s what we just covered:

1. The Problem:

Distributing keys using key % N is great... until N changes
Adding/removing servers breaks everything, and reassigns most of your keys

2. The Fix:

Consistent Hashing places servers & keys on a ring
We always go clockwise to find the right server for a key
When nodes are added/removed, only a few keys need to move
And replication saves us from unexpected server crashes

3. The Upgrade:

Virtual nodes help evenly distribute keys
Each server pretends to be multiple mini-servers on the ring
Result: fewer hot nodes, better balance, more peaceful scaling

Basically, consistent hashing is like that one senior dev who never panics — even when prod is on fire. And now, you know how they do it.

💻 Want to See the Code? Part 2 Is Coming 🔥

Now that we’ve wrapped our heads around the theory, it’s time to get our hands dirty.

In Part 2, I’ll show you exactly how to implement consistent hashing in Go (and yes, you can adapt the logic to any language).

We’ll cover:

Creating the hash ring
Adding & removing nodes
Using virtual nodes
Replicating keys
Writing a client that finds where a key belongs — instantly

All in simple, clean, Go code — with comments, visuals, and test cases that’ll make you feel like a hash wizard.

So go take a break. Maybe even share this article with your dev friends who still cry during deploys. And get ready — because Part 2 is gonna be 🔥.

Stay Connected!

💡 Follow me on LinkedIn: Archit Agarwal
🎥 Subscribe to my YouTube: The Exception Handler
📬 Sign up for my newsletter: The Weekly Golang Journal
✍️ Follow me on Medium: @architagr
👨💻 Join my subreddit: r/GolangJournal
💡 Follow me on X.com: @architagr

🛠️ Let’s Build a Cache System… and Break It Immediately

✅ Step 1: Use a Single Server to Store Everything

😩 Attempt 1: Random Assignment (a.k.a. Inky Pinky Ponky Load Balancing)

💣 I Tried key % N and Accidentally Sabotaged My Cache

🔢 Imagine We Have 10 Keys: 1 to 10

🚨 Now Let’s Add a 4th Server

😓 So What Did I Learn?

🧠 Hash Functions: What They Are and Why They Secretly Hate You

🔍 What’s a Hash Function, Really?

🦸♂️ Properties of a Good Hash Function

🤔 Where Else Have You Seen Hashing?

🔗 So What’s the Link to Distributed Caching?

🌀 Enter the Ring: Where Keys and Servers Live in Harmony

🎯 So, What Is This Magical Thing?

🔁 And Then the Magic Starts

➕ Adding a New Server (Without Melting Everything)

➖ What If a Server Dies?

🤯 Magical, Right?

🎭 Virtual Nodes: Or How I Cloned My Servers Without Getting Yelled At

🤔 Why Do We Need Virtual Nodes?

❓ Obvious Question: How Do We Assign These Virtual Positions?

🧠 And What Happens If a Server Goes Down?

💥 My Server Crashed. But It’s Fine, I Have Backups. Right?

🔁 Let’s Talk Replication

😎 Real Talk: Resilience > Perfection

🤔 Quick Question For You

🧊 Now I Can Add Servers Without Crying

🧠 Let’s Recap Before You Fall Into a Hash Hole Like I Did

1. The Problem:

2. The Fix:

3. The Upgrade:

💻 Want to See the Code? Part 2 Is Coming 🔥

Stay Connected!

The Weekly Golang Journal

1,792 follower

Merkle Trees and Anti-Entropy - Concepts and Implementation

Aug 6, 2025

How to Implement Gossip Protocol for Distributed Systems Using Go: A Practical Guide

Jul 30, 2025

Mastering Data Consistency: The Quorum Principle and Vector Clocks in Distributed Systems

Jul 23, 2025

❌ Stop Writing Dumb Rate Limiters in Go - Do This Instead

Jul 16, 2025

You're Probably Using the Wrong Rate Limiter - Here's How to Fix That

Jul 9, 2025

Implement your distributed cache in Go: From Basic Hash ring to Hash Ring redundancy pro max

Jul 2, 2025

How to Build Heartbeats in Go: Let Your Goroutines Say 'Still Breathing!'

May 21, 2025

The Silent Killers in Your Go App: Unhandled Errors

May 14, 2025

Stop Spawning Infinite Goroutines: Use Tee-Channels in Go for Scalable Event Handling

Apr 23, 2025

Master the Chain of Responsibility Pattern in Go with This Real-World Example

Apr 9, 2025

Others also viewed

Fastly's Compute at Edge: Dun Our Way

Redis and Redis Sorted Set Explained

Mastering Rate Limiting with Redis: Efficient Traffic Management for Scalable Systems

Tips for Redis to Achieve High Performance

How to use Azure Cache for Redis in a .NET Core web app

Semantic Caching with Spring AI & Redis

LLM Caching Layers : Key Value vs Semantic Caching

Why Systems Slow Down — and What Smart Caching Teaches Us About Scalability

Scaling Cache with Redis

Lower costs with Valkey on Amazon ElastiCache

Explore topics