Implement your distributed cache in Go: From Basic Hash ring to Hash Ring redundancy pro max

Archit Agarwal

PMTS @ Oracle | Golang & Concurrency Specialist | System Design Architect & Evangelist | Backend Performance Optimizer | Tech Influencer & Career Mentor | Docker | Kubernetes | AWS Certified

Published Jul 2, 2025

⚙️ Ever wondered how distributed caches like Redis or Memcached stay rock solid — even when nodes scale up or vanish mid-flight?

The secret behind that stability? Consistent Hashing. And no, it’s not just a cooler version of hash(key) % N.

Let’s be honest: working with companies that build systems we trust blindly in production — like Netflix, AWS, or Meta — feels like a dream, right? But to even sit at that table, you have to understand the problems they’ve already solved.

In my previous article, I walked you through how I finally understood the magic behind consistent hashing — breaking it down with simple analogies and real-world use cases.

Today, we’re taking that understanding and bringing it to life.

In this article, I’m not just going to explain consistent hashing. I’ll build it — from scratch. Starting from the simplest ring, upgrading it with virtual nodes, and finally making it fault-tolerant with data replication.

If you’re serious about backend engineering, buckle up. Let’s build something battle-tested.

🧱 Level 1: Building the Basic Hash Ring

🛠️ Rome wasn’t built in a day — and neither was Redis.

Consistent Hashing is a technique where both keys and nodes are hashed and plotted on a circular number line, often called the Hash Ring. Imagine it like this: Nodes are little shelters, and keys are travelers. Each key moves clockwise around the ring until it finds the next shelter.

Now, I know that dev-brain of yours is screaming: “Alright already, let me implement this!”

But not so fast. Let’s solve one problem at a time. Don’t think about keys yet — just focus on getting our nodes onto this circular ring.

🧠 Need some hints?

Hint 1: You need a data structure that’s sorted and allows for quick binary search.
Hint 2: Choose a hash function that’s deterministic, fast, and evenly distributes values.

Got an idea forming? Great. Now compare it with the implementation below. And hey — if you think there’s a better way or spot something I missed, drop it in the comments. Let’s learn together.

⚠️ One thing to keep in mind: our hash function spits out uint64 values — massive numbers. So no, we’re not going to allocate a billion-element array to represent our ring.

Instead, I’ll keep a sorted slice of hash values (representing nodes) and a map that links those to actual node instances. Simple, efficient, and scalable.

💻 Code Time: Let’s Build the Simplest Hash Ring

Here’s the first implementation — a basic, no-frills hash ring. Just nodes, no virtual replicas, no redundancy… yet.

We’ve added log support via a config, but it’s turned off by default.

var (
 ErrNoConnectedNodes = errors.New("no connected nodes available")
 ErrNodeExists       = errors.New("node already exists")
 ErrNodeNotFound     = errors.New("node not found")
 ErrInHashingKey     = errors.New("error in hashing the key")
)

type ICacheNode interface {
 GetIdentifier() string
}

type hashRingConfig struct {
 HashFunction func() hash.Hash64
 EnableLogs   bool
}

type HashRingConfigFn func(*hashRingConfig)

func SetHashFunction(f func() hash.Hash64) HashRingConfigFn {
 return func(config *hashRingConfig) {
  config.HashFunction = f
 }
}

func EnableVerboseLogs(enabled bool) HashRingConfigFn {
 return func(config *hashRingConfig) {
  config.EnableLogs = enabled
 }
}

type HashRing struct {
 mu         sync.RWMutex
 config     hashRingConfig
 nodes      sync.Map
 sortedKeys []uint64
}

func InitHashRing(opts ...HashRingConfigFn) *HashRing {
 config := &hashRingConfig{
  HashFunction: fnv.New64a,
  EnableLogs:   false,
 }
 for _, opt := range opts {
  opt(config)
 }
 return &HashRing{
  config:     *config,
  sortedKeys: make([]uint64, 0),
 }
}

func (ring *HashRing) AddNode(node ICacheNode) error {
 ring.mu.Lock()
 defer ring.mu.Unlock()

 hashValue, err := ring.generateHash(node.GetIdentifier())
 if err != nil {
  return fmt.Errorf("%w: %s", ErrInHashingKey, node.GetIdentifier())
 }

 if _, exists := ring.nodes.Load(hashValue); exists {
  return fmt.Errorf("%w: node %s", ErrNodeExists, node.GetIdentifier())
 }

 ring.nodes.Store(hashValue, node)
 ring.sortedKeys = append(ring.sortedKeys, hashValue)

 // Maintain sorted order of hash keys for binary search
 sort.Slice(ring.sortedKeys, func(i, j int) bool { return ring.sortedKeys[i] < ring.sortedKeys[j] })

 if ring.config.EnableLogs {
  log.Printf("[HashRing] ➕ Added node: %s (hash: %d)", node.GetIdentifier(), hashValue)
 }
 return nil
}

func (ring *HashRing) RemoveNode(node ICacheNode) error {
 ring.mu.Lock()
 defer ring.mu.Unlock()

 hashValue, err := ring.generateHash(node.GetIdentifier())
 if err != nil {
  return fmt.Errorf("%w: %s", ErrInHashingKey, node.GetIdentifier())
 }

 if _, found := ring.nodes.LoadAndDelete(hashValue); !found {
  return fmt.Errorf("%w: %s", ErrNodeNotFound, node.GetIdentifier())
 }

 index, err := ring.search(hashValue)
 if err != nil {
  return err
 }

 ring.sortedKeys = append(ring.sortedKeys[:index], ring.sortedKeys[index+1:]...)

 if ring.config.EnableLogs {
  log.Printf("[HashRing] ❌ Removed node: %s (hash: %d)", node.GetIdentifier(), hashValue)
 }
 return nil
}

func (ring *HashRing) Get(key string) (ICacheNode, error) {
 ring.mu.RLock()
 defer ring.mu.RUnlock()

 hashValue, err := ring.generateHash(key)
 if err != nil {
  return nil, fmt.Errorf("%w: %s", ErrInHashingKey, key)
 }

 index, err := ring.search(hashValue)
 if err != nil {
  return nil, err
 }

 nodeHash := ring.sortedKeys[index]
 if node, ok := ring.nodes.Load(nodeHash); ok {
  if ring.config.EnableLogs {
   log.Printf("[HashRing] 🔍 Key '%s' (hash: %d) mapped to node (hash: %d)", key, hashValue, nodeHash)
  }
  return node.(ICacheNode), nil
 }

 return nil, fmt.Errorf("%w: no node found for key %s", ErrNodeNotFound, key)
}

func (ring *HashRing) search(key uint64) (int, error) {
 if len(ring.sortedKeys) == 0 {
  return -1, ErrNoConnectedNodes
 }

 index := sort.Search(len(ring.sortedKeys), func(i int) bool {
  return ring.sortedKeys[i] >= key
 })

 // Wrap around the ring
 if index == len(ring.sortedKeys) {
  index = 0
 }
 return index, nil
}

// generateHash converts a string key to a uint64 hash using the configured hash function
func (ring *HashRing) generateHash(key string) (uint64, error) {
 h := ring.config.HashFunction()
 if _, err := h.Write([]byte(key)); err != nil {
  return 0, err
 }
 return h.Sum64(), nil
}

🔁 Even Hashing Is Not Even Enough

So, here’s the thing — hash functions are smart, but not that smart.

The person who wrote fnv.New64a() didn’t know you’d be relying on it to balance traffic across your distributed cache nodes. And even if it accidentally distributes things evenly the first time, what happens after you add 20 nodes and remove 7? Things will skew.

And keys? Oh, don’t even talk about them. You can’t predict how many will start with “user-” or “order-” or “archit@something.com”.

Which means: 🔴 Some nodes will get overloaded. 🔴 Some nodes will sit idle. 🔴 And when one overloaded node goes down… 💥 Pain.

🤔 So How Do We Fix It?

Here’s a fun little riddle:

What if a single node could wear 10 different hats?

That’s right — instead of placing each node on the ring once, why not place it 10 times with different hash values? This technique is called Virtual Nodes — or vnodes — and it’s our next superpower.

Now you’re thinking:

“Okay cool, but why does that help?”

Let’s do some math:

5 real nodes
10 virtual nodes per real node
Boom 💥 — we now have 50 points on the ring. More points = better distribution = lower chance of hotspots = happier backend devs.

🛠 How Do You Actually Add Virtual Nodes?

It’s simple. Take the original node ID, and for each replica, add a suffix like 0, 1, 2, ... 9. Hash all those. Add them to the ring like individual nodes. When removing, just remove all those keys.

Let’s look at the code 👇

🧱 Add Configuration for Replication

type hashRingConfig struct {
 VirtualNodes int
 HashFunction func() hash.Hash64
 EnableLogs   bool
}

func SetVirtualNodes(count int) HashRingConfigFn {
 return func(cfg *hashRingConfig) {
  cfg.VirtualNodes = count
 }
}

🧩 Add and Remove Nodes With Virtual Reps

func (ring *HashRing) AddNode(node ICacheNode) error {
 ring.mu.Lock()
 defer ring.mu.Unlock()

 nodeID := node.GetIdentifier()
 if _, exists := ring.hostMap.Load(nodeID); exists {
  return fmt.Errorf("%w: %s", ErrNodeExists, nodeID)
 }

 virtualKeys := make([]uint64, 0, ring.config.VirtualNodes)
 for i := 0; i < ring.config.VirtualNodes; i++ {
  vNodeID := fmt.Sprintf("%s_%d", nodeID, i)
  h, err := ring.generateHash(vNodeID)
  if err != nil {
   return fmt.Errorf("%w for virtual node %s", ErrHashingKey, vNodeID)
  }
  ring.vNodeMap.Store(h, node)
  virtualKeys = append(virtualKeys, h)

  if ring.config.EnableLogs {
   log.Printf("[HashRing] ➕ Added virtual node %s → hash %d", vNodeID, h)
  }
 }

 ring.hostMap.Store(nodeID, time.Now().UTC())
 ring.sortedKeys = append(ring.sortedKeys, virtualKeys...)
 slices.Sort(ring.sortedKeys)

 if ring.config.EnableLogs {
  log.Printf("[HashRing] ✅ Node %s added with %d virtual nodes", nodeID, ring.config.VirtualNodes)
 }
 return nil
}

func (ring *HashRing) RemoveNode(node ICacheNode) error {
 ring.mu.Lock()
 defer ring.mu.Unlock()

 nodeID := node.GetIdentifier()
 if _, exists := ring.hostMap.Load(nodeID); !exists {
  return fmt.Errorf("%w: %s", ErrNodeNotFound, nodeID)
 }

 for i := 0; i < ring.config.VirtualNodes; i++ {
  vNodeID := fmt.Sprintf("%s_%d", nodeID, i)
  h, err := ring.generateHash(vNodeID)
  if err != nil {
   return fmt.Errorf("%w for virtual node %s", ErrHashingKey, vNodeID)
  }
  ring.vNodeMap.Delete(h)

  // Remove from sortedKeys
  index := slices.Index(ring.sortedKeys, h)
  if index >= 0 {
   ring.sortedKeys = append(ring.sortedKeys[:index], ring.sortedKeys[index+1:]...)
  }

  if ring.config.EnableLogs {
   log.Printf("[HashRing] ❌ Removed virtual node %s (hash: %d)", vNodeID, h)
  }
 }
 ring.hostMap.Delete(nodeID)
 return nil
}

🔁 Level 3: Add Replication (Real Fault Tolerance)

☠️ One Node Down = All Data Gone?

Alright, so far we’ve solved a pretty big problem: uneven distribution.

But wait… another potential disaster is lurking. Let’s say one of the nodes we carefully distributed keys to just… dies.

Poof. All data on that node? Gone. No bueno 😵.

That’s where data replication comes in — a fundamental concept in any production-grade distributed system.

Let’s walk through it.

🤔 What’s the Problem?

While our hash ring distributes load evenly (thanks to virtual nodes), it doesn’t yet protect the data.

That means if a node fails, any data it held is lost forever — and no one’s happy in that scenario.

So, what can we do?

Well, the obvious solution is to keep multiple copies of each piece of data — something known as a replication factor.

💸 But Wait — No Extra Nodes Allowed!

Let’s say the client or your infra team says:

“You can’t add more nodes. It’s too expensive.”

Okay, fair.

So we work smarter: reuse existing nodes as backups for one another.

Here’s how it works:

When storing a key, we write it not just to the primary node, but also to N−1 additional nodes, clockwise on the hash ring.
If the primary fails, the secondary (or even tertiary) has our back.

We’re essentially assigning a replication factor to each key. 3 is a common value in real-world systems like Cassandra or DynamoDB.

🛠️ Let’s Code: Supporting Replication in the Hash Ring

Here’s how we start:

We extend the hash ring’s config to include ReplicationFactor.
Update the GetNodesForKey() method to return N distinct nodes clockwise from the hashed key.
These are the nodes where we’ll store copies of the same data.

type hashRingConfig struct {
 VirtualNodes      int
 ReplicationFactor int
 HashFunction      func() hash.Hash64
 EnableLogs        bool
}

Then we add a function to fetch N unique nodes:

func (ring *HashRing) GetNodesForKey(key string) ([]ICacheNode, error) {
 h, err := ring.generateHash(key)
 if err != nil {
  return nil, err
 }
 start := ring.search(h)
 seen := map[string]struct{}{}
 nodes := []ICacheNode{}

 for i := start; len(nodes) < ring.config.ReplicationFactor && i < start+len(ring.sortedKeys); i++ {
  vHash := ring.sortedKeys[i%len(ring.sortedKeys)]
  node, _ := ring.vNodeMap.Load(vHash)
  n := node.(ICacheNode)
  if _, ok := seen[n.GetIdentifier()]; !ok {
   nodes = append(nodes, n)
   seen[n.GetIdentifier()] = struct{}{}
  }
 }
 return nodes, nil
}

💡 Now, when we Put(key, value) into the cache, we’ll write to all the nodes returned by GetNodesForKey.

If one fails? The others still have the data.

🧪 Integration: Plug It into the Caching Server

Now that our hash ring supports redundancy, it’s time to plug it into our caching server and see some real action.

Here’s what we changed:

The Put method now stores the key in multiple nodes as returned by the hash ring’s GetNodesForKey(). This means we’re writing the same value into all the replicas—because what’s better than one backup? Two backups.
Similarly, the Delete method removes the key from all replicas, not just the primary node. Because ghost data is the scariest kind of bug.
The Get method still tries the primary node first, and if it fails, retries the backup—this is the juicy part where you might end up reading from a different node than where you wrote it. Pretty neat, huh?

👉 You can find the updated caching server code in this GitHub repo. Feel free to fork and play around with it!

🔥 Test Run: Break Things and Watch It Still Work

Let’s put all this theory to the test.

Here’s what we’re going to simulate:

Save a key using Put(). Log which node wrote it.
Simulate a node failure by manually removing the primary node from the ring.
Try to read the key again using Get().
Watch the magic as it retrieves from one of the replicas without a hiccup.

Here’s a test snippet:

server := InitCachingServer(fnv.New64a, 5)
server.Put("token123", "archit-token")

val, err := server.Get("token123")
fmt.Println("🔁 Value after node failure:", val)

And remember, this isn’t magic — it’s redundancy + consistent hashing.

🧠 Final Thoughts: You Just Built a Fault-Tolerant Hashing Ring

We’ve come a long way — from naive mod N key placement to building a redundant, fault-tolerant cache system using consistent hashing.

But this system isn’t perfect. Here are some open challenges:

Gossip Protocol: Currently, our nodes don’t talk to each other. If a backup node gets updated but the primary goes down, how do we sync the data? Enter gossip protocols — a powerful way for nodes to share state and self-heal.
Atomicity: What happens if you write to one replica and fail on the second? You now have inconsistent state. We need a write-ahead log or 2PC for true consistency.
Performance Bottlenecks: Right now, we wait for all replicas to complete before returning a response. In a real system, you’d write asynchronously or use quorum logic.

🚀 I want you to pick up from here — implement these next steps, and share your code. Ping me on GitHub or Twitter. Let’s make the distributed systems world less mysterious, one key at a time.

🗃️ Code repo: https://github.com/architagr/The-Weekly-Golang-Journal/tree/main/consistent_hashing_distributed_cache

Stay Connected!

💡 Follow me on LinkedIn: Archit Agarwal
🎥 Subscribe to my YouTube: The Exception Handler
📬 Sign up for my newsletter: The Weekly Golang Journal
✍️ Follow me on Medium: @architagr
👨💻 Join my subreddit: r/GolangJournal
💡 Follow me on X.com: @architagr

The Weekly Golang Journal

1,792 follower

+ Subscribe

Divesh Kumar

1mo

It’s an amazing read Archit Agarwal , keep sharing these type of articles

1 Reaction

Liju Thomas

Full Stack Engineer | Go • TypeScript • React • Microservices • gRPC • Docker • K8s • Redis • NATS • PostgreSQL • MongoDB • Clean Code

1mo

Worth Reading, thanks Archit

1 Reaction

Ramadhevi RK

SDE @Oracle | Distributed Systems & Scalable Systems | Problem Solver | Product Thinking

1mo

Thanks for sharing, Archit Agarwal

Bhargav Sakariya

Java Developer

1mo

Cassandra jesa sound kar raha hay

1 Reaction

See more comments

To view or add a comment, sign in

See all

Implement your distributed cache in Go: From Basic Hash ring to Hash Ring redundancy pro max

Archit Agarwal

PMTS @ Oracle | Golang & Concurrency Specialist | System Design Architect & Evangelist | Backend Performance Optimizer | Tech Influencer & Career Mentor | Docker | Kubernetes | AWS Certified

🧱 Level 1: Building the Basic Hash Ring

💻 Code Time: Let’s Build the Simplest Hash Ring

🔁 Even Hashing Is Not Even Enough

🤔 So How Do We Fix It?

🛠 How Do You Actually Add Virtual Nodes?

🧱 Add Configuration for Replication

🧩 Add and Remove Nodes With Virtual Reps

🔁 Level 3: Add Replication (Real Fault Tolerance)

☠️ One Node Down = All Data Gone?

🤔 What’s the Problem?

💸 But Wait — No Extra Nodes Allowed!

🛠️ Let’s Code: Supporting Replication in the Hash Ring

🧪 Integration: Plug It into the Caching Server

🔥 Test Run: Break Things and Watch It Still Work

🧠 Final Thoughts: You Just Built a Fault-Tolerant Hashing Ring

Stay Connected!

The Weekly Golang Journal

1,792 follower

More articles by this author

Others also viewed

Exploring Key Distributed System Algorithms and Concepts Series

Caching is not always in-memory

Long Time No See!

The ScyllaDB Sync: September 2024

Redis and Redis Sorted Set Explained

From 900ms to 10ms: Optimizing MongoDB Queries for 10M+ Records 🚀

90% of systems fail to scale… due to one overlooked detail: caching.

CAP Theorem: Understanding Trade-Offs in Distributed Systems

Cut MongoDB Costs by 7x: Compression Benchmarks for HyperLogLog Optimization

Using Redis with Rust

Explore topics

🧱 Level 1: Building the Basic Hash Ring

💻 Code Time: Let’s Build the Simplest Hash Ring

🔁 Even Hashing Is Not Even Enough

🤔 So How Do We Fix It?

🛠 How Do You Actually Add Virtual Nodes?

🧱 Add Configuration for Replication

🧩 Add and Remove Nodes With Virtual Reps

🔁 Level 3: Add Replication (Real Fault Tolerance)

☠️ One Node Down = All Data Gone?

🤔 What’s the Problem?

💸 But Wait — No Extra Nodes Allowed!

🛠️ Let’s Code: Supporting Replication in the Hash Ring

🧪 Integration: Plug It into the Caching Server

🔥 Test Run: Break Things and Watch It Still Work

🧠 Final Thoughts: You Just Built a Fault-Tolerant Hashing Ring

Stay Connected!

The Weekly Golang Journal

1,792 follower

Merkle Trees and Anti-Entropy - Concepts and Implementation

Aug 6, 2025

How to Implement Gossip Protocol for Distributed Systems Using Go: A Practical Guide

Jul 30, 2025

Mastering Data Consistency: The Quorum Principle and Vector Clocks in Distributed Systems

Jul 23, 2025

❌ Stop Writing Dumb Rate Limiters in Go - Do This Instead

Jul 16, 2025

You're Probably Using the Wrong Rate Limiter - Here's How to Fix That

Jul 9, 2025

Consistent Hashing Explained: The Algorithm That Doesn't Freak Out When You Scale

Jun 25, 2025

How to Build Heartbeats in Go: Let Your Goroutines Say 'Still Breathing!'

May 21, 2025

The Silent Killers in Your Go App: Unhandled Errors

May 14, 2025

Stop Spawning Infinite Goroutines: Use Tee-Channels in Go for Scalable Event Handling

Apr 23, 2025

Master the Chain of Responsibility Pattern in Go with This Real-World Example

Apr 9, 2025

Others also viewed

Exploring Key Distributed System Algorithms and Concepts Series

Caching is not always in-memory

Long Time No See!

The ScyllaDB Sync: September 2024

Redis and Redis Sorted Set Explained

From 900ms to 10ms: Optimizing MongoDB Queries for 10M+ Records 🚀

90% of systems fail to scale… due to one overlooked detail: caching.

CAP Theorem: Understanding Trade-Offs in Distributed Systems

Cut MongoDB Costs by 7x: Compression Benchmarks for HyperLogLog Optimization

Using Redis with Rust

Explore topics