Implement your distributed cache in Go: From Basic Hash ring to Hash Ring redundancy pro max
From naïve key distribution to a fault-tolerant hash ring — build your own scalable cache system with Go and zero regrets.

Implement your distributed cache in Go: From Basic Hash ring to Hash Ring redundancy pro max

⚙️ Ever wondered how distributed caches like Redis or Memcached stay rock solid — even when nodes scale up or vanish mid-flight?

The secret behind that stability? Consistent Hashing. And no, it’s not just a cooler version of hash(key) % N.

Let’s be honest: working with companies that build systems we trust blindly in production — like Netflix, AWS, or Meta — feels like a dream, right? But to even sit at that table, you have to understand the problems they’ve already solved.

In my previous article, I walked you through how I finally understood the magic behind consistent hashing — breaking it down with simple analogies and real-world use cases.

Today, we’re taking that understanding and bringing it to life.

In this article, I’m not just going to explain consistent hashing. I’ll build it — from scratch. Starting from the simplest ring, upgrading it with virtual nodes, and finally making it fault-tolerant with data replication.

If you’re serious about backend engineering, buckle up. Let’s build something battle-tested.

🧱 Level 1: Building the Basic Hash Ring

🛠️ Rome wasn’t built in a day — and neither was Redis.

Consistent Hashing is a technique where both keys and nodes are hashed and plotted on a circular number line, often called the Hash Ring. Imagine it like this: Nodes are little shelters, and keys are travelers. Each key moves clockwise around the ring until it finds the next shelter.

Now, I know that dev-brain of yours is screaming: “Alright already, let me implement this!”

But not so fast. Let’s solve one problem at a time. Don’t think about keys yet — just focus on getting our nodes onto this circular ring.

🧠 Need some hints?

  • Hint 1: You need a data structure that’s sorted and allows for quick binary search.
  • Hint 2: Choose a hash function that’s deterministic, fast, and evenly distributes values.

Got an idea forming? Great. Now compare it with the implementation below. And hey — if you think there’s a better way or spot something I missed, drop it in the comments. Let’s learn together.

⚠️ One thing to keep in mind: our hash function spits out uint64 values — massive numbers. So no, we’re not going to allocate a billion-element array to represent our ring.

Instead, I’ll keep a sorted slice of hash values (representing nodes) and a map that links those to actual node instances. Simple, efficient, and scalable.

💻 Code Time: Let’s Build the Simplest Hash Ring


Article content

Here’s the first implementation — a basic, no-frills hash ring. Just nodes, no virtual replicas, no redundancy… yet.

We’ve added log support via a config, but it’s turned off by default.

var (
 ErrNoConnectedNodes = errors.New("no connected nodes available")
 ErrNodeExists       = errors.New("node already exists")
 ErrNodeNotFound     = errors.New("node not found")
 ErrInHashingKey     = errors.New("error in hashing the key")
)

type ICacheNode interface {
 GetIdentifier() string
}

type hashRingConfig struct {
 HashFunction func() hash.Hash64
 EnableLogs   bool
}

type HashRingConfigFn func(*hashRingConfig)

func SetHashFunction(f func() hash.Hash64) HashRingConfigFn {
 return func(config *hashRingConfig) {
  config.HashFunction = f
 }
}

func EnableVerboseLogs(enabled bool) HashRingConfigFn {
 return func(config *hashRingConfig) {
  config.EnableLogs = enabled
 }
}

type HashRing struct {
 mu         sync.RWMutex
 config     hashRingConfig
 nodes      sync.Map
 sortedKeys []uint64
}

func InitHashRing(opts ...HashRingConfigFn) *HashRing {
 config := &hashRingConfig{
  HashFunction: fnv.New64a,
  EnableLogs:   false,
 }
 for _, opt := range opts {
  opt(config)
 }
 return &HashRing{
  config:     *config,
  sortedKeys: make([]uint64, 0),
 }
}

func (ring *HashRing) AddNode(node ICacheNode) error {
 ring.mu.Lock()
 defer ring.mu.Unlock()

 hashValue, err := ring.generateHash(node.GetIdentifier())
 if err != nil {
  return fmt.Errorf("%w: %s", ErrInHashingKey, node.GetIdentifier())
 }

 if _, exists := ring.nodes.Load(hashValue); exists {
  return fmt.Errorf("%w: node %s", ErrNodeExists, node.GetIdentifier())
 }

 ring.nodes.Store(hashValue, node)
 ring.sortedKeys = append(ring.sortedKeys, hashValue)

 // Maintain sorted order of hash keys for binary search
 sort.Slice(ring.sortedKeys, func(i, j int) bool { return ring.sortedKeys[i] < ring.sortedKeys[j] })

 if ring.config.EnableLogs {
  log.Printf("[HashRing] ➕ Added node: %s (hash: %d)", node.GetIdentifier(), hashValue)
 }
 return nil
}

func (ring *HashRing) RemoveNode(node ICacheNode) error {
 ring.mu.Lock()
 defer ring.mu.Unlock()

 hashValue, err := ring.generateHash(node.GetIdentifier())
 if err != nil {
  return fmt.Errorf("%w: %s", ErrInHashingKey, node.GetIdentifier())
 }

 if _, found := ring.nodes.LoadAndDelete(hashValue); !found {
  return fmt.Errorf("%w: %s", ErrNodeNotFound, node.GetIdentifier())
 }

 index, err := ring.search(hashValue)
 if err != nil {
  return err
 }

 ring.sortedKeys = append(ring.sortedKeys[:index], ring.sortedKeys[index+1:]...)

 if ring.config.EnableLogs {
  log.Printf("[HashRing] ❌ Removed node: %s (hash: %d)", node.GetIdentifier(), hashValue)
 }
 return nil
}

func (ring *HashRing) Get(key string) (ICacheNode, error) {
 ring.mu.RLock()
 defer ring.mu.RUnlock()

 hashValue, err := ring.generateHash(key)
 if err != nil {
  return nil, fmt.Errorf("%w: %s", ErrInHashingKey, key)
 }

 index, err := ring.search(hashValue)
 if err != nil {
  return nil, err
 }

 nodeHash := ring.sortedKeys[index]
 if node, ok := ring.nodes.Load(nodeHash); ok {
  if ring.config.EnableLogs {
   log.Printf("[HashRing] 🔍 Key '%s' (hash: %d) mapped to node (hash: %d)", key, hashValue, nodeHash)
  }
  return node.(ICacheNode), nil
 }

 return nil, fmt.Errorf("%w: no node found for key %s", ErrNodeNotFound, key)
}

func (ring *HashRing) search(key uint64) (int, error) {
 if len(ring.sortedKeys) == 0 {
  return -1, ErrNoConnectedNodes
 }

 index := sort.Search(len(ring.sortedKeys), func(i int) bool {
  return ring.sortedKeys[i] >= key
 })

 // Wrap around the ring
 if index == len(ring.sortedKeys) {
  index = 0
 }
 return index, nil
}

// generateHash converts a string key to a uint64 hash using the configured hash function
func (ring *HashRing) generateHash(key string) (uint64, error) {
 h := ring.config.HashFunction()
 if _, err := h.Write([]byte(key)); err != nil {
  return 0, err
 }
 return h.Sum64(), nil
}        

🔁 Even Hashing Is Not Even Enough

So, here’s the thing — hash functions are smart, but not that smart.

The person who wrote fnv.New64a() didn’t know you’d be relying on it to balance traffic across your distributed cache nodes. And even if it accidentally distributes things evenly the first time, what happens after you add 20 nodes and remove 7? Things will skew.

And keys? Oh, don’t even talk about them. You can’t predict how many will start with “user-” or “order-” or “archit@something.com”.

Which means: 🔴 Some nodes will get overloaded. 🔴 Some nodes will sit idle. 🔴 And when one overloaded node goes down… 💥 Pain.

🤔 So How Do We Fix It?

Here’s a fun little riddle:

What if a single node could wear 10 different hats?

That’s right — instead of placing each node on the ring once, why not place it 10 times with different hash values? This technique is called Virtual Nodes — or vnodes — and it’s our next superpower.

Now you’re thinking:

“Okay cool, but why does that help?”

Let’s do some math:

  • 5 real nodes
  • 10 virtual nodes per real node
  • Boom 💥 — we now have 50 points on the ring. More points = better distribution = lower chance of hotspots = happier backend devs.

🛠 How Do You Actually Add Virtual Nodes?


Article content

It’s simple. Take the original node ID, and for each replica, add a suffix like 0, 1, 2, ... 9. Hash all those. Add them to the ring like individual nodes. When removing, just remove all those keys.

Let’s look at the code 👇

🧱 Add Configuration for Replication

type hashRingConfig struct {
 VirtualNodes int
 HashFunction func() hash.Hash64
 EnableLogs   bool
}

func SetVirtualNodes(count int) HashRingConfigFn {
 return func(cfg *hashRingConfig) {
  cfg.VirtualNodes = count
 }
}        

🧩 Add and Remove Nodes With Virtual Reps

func (ring *HashRing) AddNode(node ICacheNode) error {
 ring.mu.Lock()
 defer ring.mu.Unlock()

 nodeID := node.GetIdentifier()
 if _, exists := ring.hostMap.Load(nodeID); exists {
  return fmt.Errorf("%w: %s", ErrNodeExists, nodeID)
 }

 virtualKeys := make([]uint64, 0, ring.config.VirtualNodes)
 for i := 0; i < ring.config.VirtualNodes; i++ {
  vNodeID := fmt.Sprintf("%s_%d", nodeID, i)
  h, err := ring.generateHash(vNodeID)
  if err != nil {
   return fmt.Errorf("%w for virtual node %s", ErrHashingKey, vNodeID)
  }
  ring.vNodeMap.Store(h, node)
  virtualKeys = append(virtualKeys, h)

  if ring.config.EnableLogs {
   log.Printf("[HashRing] ➕ Added virtual node %s → hash %d", vNodeID, h)
  }
 }

 ring.hostMap.Store(nodeID, time.Now().UTC())
 ring.sortedKeys = append(ring.sortedKeys, virtualKeys...)
 slices.Sort(ring.sortedKeys)

 if ring.config.EnableLogs {
  log.Printf("[HashRing] ✅ Node %s added with %d virtual nodes", nodeID, ring.config.VirtualNodes)
 }
 return nil
}

func (ring *HashRing) RemoveNode(node ICacheNode) error {
 ring.mu.Lock()
 defer ring.mu.Unlock()

 nodeID := node.GetIdentifier()
 if _, exists := ring.hostMap.Load(nodeID); !exists {
  return fmt.Errorf("%w: %s", ErrNodeNotFound, nodeID)
 }

 for i := 0; i < ring.config.VirtualNodes; i++ {
  vNodeID := fmt.Sprintf("%s_%d", nodeID, i)
  h, err := ring.generateHash(vNodeID)
  if err != nil {
   return fmt.Errorf("%w for virtual node %s", ErrHashingKey, vNodeID)
  }
  ring.vNodeMap.Delete(h)

  // Remove from sortedKeys
  index := slices.Index(ring.sortedKeys, h)
  if index >= 0 {
   ring.sortedKeys = append(ring.sortedKeys[:index], ring.sortedKeys[index+1:]...)
  }

  if ring.config.EnableLogs {
   log.Printf("[HashRing] ❌ Removed virtual node %s (hash: %d)", vNodeID, h)
  }
 }
 ring.hostMap.Delete(nodeID)
 return nil
}        

🔁 Level 3: Add Replication (Real Fault Tolerance)

☠️ One Node Down = All Data Gone?

Alright, so far we’ve solved a pretty big problem: uneven distribution.

But wait… another potential disaster is lurking. Let’s say one of the nodes we carefully distributed keys to just… dies.

Poof. All data on that node? Gone. No bueno 😵.

That’s where data replication comes in — a fundamental concept in any production-grade distributed system.

Let’s walk through it.

🤔 What’s the Problem?

While our hash ring distributes load evenly (thanks to virtual nodes), it doesn’t yet protect the data.

That means if a node fails, any data it held is lost forever — and no one’s happy in that scenario.

So, what can we do?

Well, the obvious solution is to keep multiple copies of each piece of data — something known as a replication factor.

💸 But Wait — No Extra Nodes Allowed!

Let’s say the client or your infra team says:

“You can’t add more nodes. It’s too expensive.”

Okay, fair.

So we work smarter: reuse existing nodes as backups for one another.

Here’s how it works:

  • When storing a key, we write it not just to the primary node, but also to N−1 additional nodes, clockwise on the hash ring.
  • If the primary fails, the secondary (or even tertiary) has our back.

We’re essentially assigning a replication factor to each key. 3 is a common value in real-world systems like Cassandra or DynamoDB.

🛠️ Let’s Code: Supporting Replication in the Hash Ring

Here’s how we start:

  • We extend the hash ring’s config to include ReplicationFactor.
  • Update the GetNodesForKey() method to return N distinct nodes clockwise from the hashed key.
  • These are the nodes where we’ll store copies of the same data.

type hashRingConfig struct {
 VirtualNodes      int
 ReplicationFactor int
 HashFunction      func() hash.Hash64
 EnableLogs        bool
}        

Then we add a function to fetch N unique nodes:

func (ring *HashRing) GetNodesForKey(key string) ([]ICacheNode, error) {
 h, err := ring.generateHash(key)
 if err != nil {
  return nil, err
 }
 start := ring.search(h)
 seen := map[string]struct{}{}
 nodes := []ICacheNode{}

 for i := start; len(nodes) < ring.config.ReplicationFactor && i < start+len(ring.sortedKeys); i++ {
  vHash := ring.sortedKeys[i%len(ring.sortedKeys)]
  node, _ := ring.vNodeMap.Load(vHash)
  n := node.(ICacheNode)
  if _, ok := seen[n.GetIdentifier()]; !ok {
   nodes = append(nodes, n)
   seen[n.GetIdentifier()] = struct{}{}
  }
 }
 return nodes, nil
}        

💡 Now, when we Put(key, value) into the cache, we’ll write to all the nodes returned by GetNodesForKey.

If one fails? The others still have the data.

🧪 Integration: Plug It into the Caching Server

Now that our hash ring supports redundancy, it’s time to plug it into our caching server and see some real action.

Here’s what we changed:

  1. The Put method now stores the key in multiple nodes as returned by the hash ring’s GetNodesForKey(). This means we’re writing the same value into all the replicas—because what’s better than one backup? Two backups.
  2. Similarly, the Delete method removes the key from all replicas, not just the primary node. Because ghost data is the scariest kind of bug.
  3. The Get method still tries the primary node first, and if it fails, retries the backup—this is the juicy part where you might end up reading from a different node than where you wrote it. Pretty neat, huh?

👉 You can find the updated caching server code in this GitHub repo. Feel free to fork and play around with it!

🔥 Test Run: Break Things and Watch It Still Work

Let’s put all this theory to the test.

Here’s what we’re going to simulate:

  • Save a key using Put(). Log which node wrote it.
  • Simulate a node failure by manually removing the primary node from the ring.
  • Try to read the key again using Get().
  • Watch the magic as it retrieves from one of the replicas without a hiccup.

Here’s a test snippet:

server := InitCachingServer(fnv.New64a, 5)
server.Put("token123", "archit-token")

val, err := server.Get("token123")
fmt.Println("🔁 Value after node failure:", val)        

And remember, this isn’t magic — it’s redundancy + consistent hashing.

🧠 Final Thoughts: You Just Built a Fault-Tolerant Hashing Ring

We’ve come a long way — from naive mod N key placement to building a redundant, fault-tolerant cache system using consistent hashing.

But this system isn’t perfect. Here are some open challenges:

  1. Gossip Protocol: Currently, our nodes don’t talk to each other. If a backup node gets updated but the primary goes down, how do we sync the data? Enter gossip protocols — a powerful way for nodes to share state and self-heal.
  2. Atomicity: What happens if you write to one replica and fail on the second? You now have inconsistent state. We need a write-ahead log or 2PC for true consistency.
  3. Performance Bottlenecks: Right now, we wait for all replicas to complete before returning a response. In a real system, you’d write asynchronously or use quorum logic.

🚀 I want you to pick up from here — implement these next steps, and share your code. Ping me on GitHub or Twitter. Let’s make the distributed systems world less mysterious, one key at a time.

🗃️ Code repo: https://github.com/architagr/The-Weekly-Golang-Journal/tree/main/consistent_hashing_distributed_cache

Stay Connected!

Divesh Kumar

Helping People succeed in their IT Careers | Career Coach | Youtube: DiTechVerse | Senior Tech Lead | Ex-Infoscian | Connect with me for 1:1 Mentorship

1mo

It’s an amazing read Archit Agarwal , keep sharing these type of articles

Liju Thomas

Full Stack Engineer | Go • TypeScript • React • Microservices • gRPC • Docker • K8s • Redis • NATS • PostgreSQL • MongoDB • Clean Code

1mo

Worth Reading, thanks Archit

Ramadhevi RK

SDE @Oracle | Distributed Systems & Scalable Systems | Problem Solver | Product Thinking

1mo

Thanks for sharing, Archit Agarwal

Like
Reply

Cassandra jesa sound kar raha hay

To view or add a comment, sign in

Others also viewed

Explore topics