Implement your distributed cache in Go: From Basic Hash ring to Hash Ring redundancy pro max
⚙️ Ever wondered how distributed caches like Redis or Memcached stay rock solid — even when nodes scale up or vanish mid-flight?
The secret behind that stability? Consistent Hashing. And no, it’s not just a cooler version of hash(key) % N.
Let’s be honest: working with companies that build systems we trust blindly in production — like Netflix, AWS, or Meta — feels like a dream, right? But to even sit at that table, you have to understand the problems they’ve already solved.
In my previous article, I walked you through how I finally understood the magic behind consistent hashing — breaking it down with simple analogies and real-world use cases.
Today, we’re taking that understanding and bringing it to life.
In this article, I’m not just going to explain consistent hashing. I’ll build it — from scratch. Starting from the simplest ring, upgrading it with virtual nodes, and finally making it fault-tolerant with data replication.
If you’re serious about backend engineering, buckle up. Let’s build something battle-tested.
🧱 Level 1: Building the Basic Hash Ring
🛠️ Rome wasn’t built in a day — and neither was Redis.
Consistent Hashing is a technique where both keys and nodes are hashed and plotted on a circular number line, often called the Hash Ring. Imagine it like this: Nodes are little shelters, and keys are travelers. Each key moves clockwise around the ring until it finds the next shelter.
Now, I know that dev-brain of yours is screaming: “Alright already, let me implement this!”
But not so fast. Let’s solve one problem at a time. Don’t think about keys yet — just focus on getting our nodes onto this circular ring.
🧠 Need some hints?
Got an idea forming? Great. Now compare it with the implementation below. And hey — if you think there’s a better way or spot something I missed, drop it in the comments. Let’s learn together.
⚠️ One thing to keep in mind: our hash function spits out uint64 values — massive numbers. So no, we’re not going to allocate a billion-element array to represent our ring.
Instead, I’ll keep a sorted slice of hash values (representing nodes) and a map that links those to actual node instances. Simple, efficient, and scalable.
💻 Code Time: Let’s Build the Simplest Hash Ring
Here’s the first implementation — a basic, no-frills hash ring. Just nodes, no virtual replicas, no redundancy… yet.
We’ve added log support via a config, but it’s turned off by default.
var (
ErrNoConnectedNodes = errors.New("no connected nodes available")
ErrNodeExists = errors.New("node already exists")
ErrNodeNotFound = errors.New("node not found")
ErrInHashingKey = errors.New("error in hashing the key")
)
type ICacheNode interface {
GetIdentifier() string
}
type hashRingConfig struct {
HashFunction func() hash.Hash64
EnableLogs bool
}
type HashRingConfigFn func(*hashRingConfig)
func SetHashFunction(f func() hash.Hash64) HashRingConfigFn {
return func(config *hashRingConfig) {
config.HashFunction = f
}
}
func EnableVerboseLogs(enabled bool) HashRingConfigFn {
return func(config *hashRingConfig) {
config.EnableLogs = enabled
}
}
type HashRing struct {
mu sync.RWMutex
config hashRingConfig
nodes sync.Map
sortedKeys []uint64
}
func InitHashRing(opts ...HashRingConfigFn) *HashRing {
config := &hashRingConfig{
HashFunction: fnv.New64a,
EnableLogs: false,
}
for _, opt := range opts {
opt(config)
}
return &HashRing{
config: *config,
sortedKeys: make([]uint64, 0),
}
}
func (ring *HashRing) AddNode(node ICacheNode) error {
ring.mu.Lock()
defer ring.mu.Unlock()
hashValue, err := ring.generateHash(node.GetIdentifier())
if err != nil {
return fmt.Errorf("%w: %s", ErrInHashingKey, node.GetIdentifier())
}
if _, exists := ring.nodes.Load(hashValue); exists {
return fmt.Errorf("%w: node %s", ErrNodeExists, node.GetIdentifier())
}
ring.nodes.Store(hashValue, node)
ring.sortedKeys = append(ring.sortedKeys, hashValue)
// Maintain sorted order of hash keys for binary search
sort.Slice(ring.sortedKeys, func(i, j int) bool { return ring.sortedKeys[i] < ring.sortedKeys[j] })
if ring.config.EnableLogs {
log.Printf("[HashRing] ➕ Added node: %s (hash: %d)", node.GetIdentifier(), hashValue)
}
return nil
}
func (ring *HashRing) RemoveNode(node ICacheNode) error {
ring.mu.Lock()
defer ring.mu.Unlock()
hashValue, err := ring.generateHash(node.GetIdentifier())
if err != nil {
return fmt.Errorf("%w: %s", ErrInHashingKey, node.GetIdentifier())
}
if _, found := ring.nodes.LoadAndDelete(hashValue); !found {
return fmt.Errorf("%w: %s", ErrNodeNotFound, node.GetIdentifier())
}
index, err := ring.search(hashValue)
if err != nil {
return err
}
ring.sortedKeys = append(ring.sortedKeys[:index], ring.sortedKeys[index+1:]...)
if ring.config.EnableLogs {
log.Printf("[HashRing] ❌ Removed node: %s (hash: %d)", node.GetIdentifier(), hashValue)
}
return nil
}
func (ring *HashRing) Get(key string) (ICacheNode, error) {
ring.mu.RLock()
defer ring.mu.RUnlock()
hashValue, err := ring.generateHash(key)
if err != nil {
return nil, fmt.Errorf("%w: %s", ErrInHashingKey, key)
}
index, err := ring.search(hashValue)
if err != nil {
return nil, err
}
nodeHash := ring.sortedKeys[index]
if node, ok := ring.nodes.Load(nodeHash); ok {
if ring.config.EnableLogs {
log.Printf("[HashRing] 🔍 Key '%s' (hash: %d) mapped to node (hash: %d)", key, hashValue, nodeHash)
}
return node.(ICacheNode), nil
}
return nil, fmt.Errorf("%w: no node found for key %s", ErrNodeNotFound, key)
}
func (ring *HashRing) search(key uint64) (int, error) {
if len(ring.sortedKeys) == 0 {
return -1, ErrNoConnectedNodes
}
index := sort.Search(len(ring.sortedKeys), func(i int) bool {
return ring.sortedKeys[i] >= key
})
// Wrap around the ring
if index == len(ring.sortedKeys) {
index = 0
}
return index, nil
}
// generateHash converts a string key to a uint64 hash using the configured hash function
func (ring *HashRing) generateHash(key string) (uint64, error) {
h := ring.config.HashFunction()
if _, err := h.Write([]byte(key)); err != nil {
return 0, err
}
return h.Sum64(), nil
}
🔁 Even Hashing Is Not Even Enough
So, here’s the thing — hash functions are smart, but not that smart.
The person who wrote fnv.New64a() didn’t know you’d be relying on it to balance traffic across your distributed cache nodes. And even if it accidentally distributes things evenly the first time, what happens after you add 20 nodes and remove 7? Things will skew.
And keys? Oh, don’t even talk about them. You can’t predict how many will start with “user-” or “order-” or “archit@something.com”.
Which means: 🔴 Some nodes will get overloaded. 🔴 Some nodes will sit idle. 🔴 And when one overloaded node goes down… 💥 Pain.
🤔 So How Do We Fix It?
Here’s a fun little riddle:
What if a single node could wear 10 different hats?
That’s right — instead of placing each node on the ring once, why not place it 10 times with different hash values? This technique is called Virtual Nodes — or vnodes — and it’s our next superpower.
Now you’re thinking:
“Okay cool, but why does that help?”
Let’s do some math:
🛠 How Do You Actually Add Virtual Nodes?
It’s simple. Take the original node ID, and for each replica, add a suffix like 0, 1, 2, ... 9. Hash all those. Add them to the ring like individual nodes. When removing, just remove all those keys.
Let’s look at the code 👇
🧱 Add Configuration for Replication
type hashRingConfig struct {
VirtualNodes int
HashFunction func() hash.Hash64
EnableLogs bool
}
func SetVirtualNodes(count int) HashRingConfigFn {
return func(cfg *hashRingConfig) {
cfg.VirtualNodes = count
}
}
🧩 Add and Remove Nodes With Virtual Reps
func (ring *HashRing) AddNode(node ICacheNode) error {
ring.mu.Lock()
defer ring.mu.Unlock()
nodeID := node.GetIdentifier()
if _, exists := ring.hostMap.Load(nodeID); exists {
return fmt.Errorf("%w: %s", ErrNodeExists, nodeID)
}
virtualKeys := make([]uint64, 0, ring.config.VirtualNodes)
for i := 0; i < ring.config.VirtualNodes; i++ {
vNodeID := fmt.Sprintf("%s_%d", nodeID, i)
h, err := ring.generateHash(vNodeID)
if err != nil {
return fmt.Errorf("%w for virtual node %s", ErrHashingKey, vNodeID)
}
ring.vNodeMap.Store(h, node)
virtualKeys = append(virtualKeys, h)
if ring.config.EnableLogs {
log.Printf("[HashRing] ➕ Added virtual node %s → hash %d", vNodeID, h)
}
}
ring.hostMap.Store(nodeID, time.Now().UTC())
ring.sortedKeys = append(ring.sortedKeys, virtualKeys...)
slices.Sort(ring.sortedKeys)
if ring.config.EnableLogs {
log.Printf("[HashRing] ✅ Node %s added with %d virtual nodes", nodeID, ring.config.VirtualNodes)
}
return nil
}
func (ring *HashRing) RemoveNode(node ICacheNode) error {
ring.mu.Lock()
defer ring.mu.Unlock()
nodeID := node.GetIdentifier()
if _, exists := ring.hostMap.Load(nodeID); !exists {
return fmt.Errorf("%w: %s", ErrNodeNotFound, nodeID)
}
for i := 0; i < ring.config.VirtualNodes; i++ {
vNodeID := fmt.Sprintf("%s_%d", nodeID, i)
h, err := ring.generateHash(vNodeID)
if err != nil {
return fmt.Errorf("%w for virtual node %s", ErrHashingKey, vNodeID)
}
ring.vNodeMap.Delete(h)
// Remove from sortedKeys
index := slices.Index(ring.sortedKeys, h)
if index >= 0 {
ring.sortedKeys = append(ring.sortedKeys[:index], ring.sortedKeys[index+1:]...)
}
if ring.config.EnableLogs {
log.Printf("[HashRing] ❌ Removed virtual node %s (hash: %d)", vNodeID, h)
}
}
ring.hostMap.Delete(nodeID)
return nil
}
🔁 Level 3: Add Replication (Real Fault Tolerance)
☠️ One Node Down = All Data Gone?
Alright, so far we’ve solved a pretty big problem: uneven distribution.
But wait… another potential disaster is lurking. Let’s say one of the nodes we carefully distributed keys to just… dies.
Poof. All data on that node? Gone. No bueno 😵.
That’s where data replication comes in — a fundamental concept in any production-grade distributed system.
Let’s walk through it.
🤔 What’s the Problem?
While our hash ring distributes load evenly (thanks to virtual nodes), it doesn’t yet protect the data.
That means if a node fails, any data it held is lost forever — and no one’s happy in that scenario.
So, what can we do?
Well, the obvious solution is to keep multiple copies of each piece of data — something known as a replication factor.
💸 But Wait — No Extra Nodes Allowed!
Let’s say the client or your infra team says:
“You can’t add more nodes. It’s too expensive.”
Okay, fair.
So we work smarter: reuse existing nodes as backups for one another.
Here’s how it works:
We’re essentially assigning a replication factor to each key. 3 is a common value in real-world systems like Cassandra or DynamoDB.
🛠️ Let’s Code: Supporting Replication in the Hash Ring
Here’s how we start:
type hashRingConfig struct {
VirtualNodes int
ReplicationFactor int
HashFunction func() hash.Hash64
EnableLogs bool
}
Then we add a function to fetch N unique nodes:
func (ring *HashRing) GetNodesForKey(key string) ([]ICacheNode, error) {
h, err := ring.generateHash(key)
if err != nil {
return nil, err
}
start := ring.search(h)
seen := map[string]struct{}{}
nodes := []ICacheNode{}
for i := start; len(nodes) < ring.config.ReplicationFactor && i < start+len(ring.sortedKeys); i++ {
vHash := ring.sortedKeys[i%len(ring.sortedKeys)]
node, _ := ring.vNodeMap.Load(vHash)
n := node.(ICacheNode)
if _, ok := seen[n.GetIdentifier()]; !ok {
nodes = append(nodes, n)
seen[n.GetIdentifier()] = struct{}{}
}
}
return nodes, nil
}
💡 Now, when we Put(key, value) into the cache, we’ll write to all the nodes returned by GetNodesForKey.
If one fails? The others still have the data.
🧪 Integration: Plug It into the Caching Server
Now that our hash ring supports redundancy, it’s time to plug it into our caching server and see some real action.
Here’s what we changed:
👉 You can find the updated caching server code in this GitHub repo. Feel free to fork and play around with it!
🔥 Test Run: Break Things and Watch It Still Work
Let’s put all this theory to the test.
Here’s what we’re going to simulate:
Here’s a test snippet:
server := InitCachingServer(fnv.New64a, 5)
server.Put("token123", "archit-token")
val, err := server.Get("token123")
fmt.Println("🔁 Value after node failure:", val)
And remember, this isn’t magic — it’s redundancy + consistent hashing.
🧠 Final Thoughts: You Just Built a Fault-Tolerant Hashing Ring
We’ve come a long way — from naive mod N key placement to building a redundant, fault-tolerant cache system using consistent hashing.
But this system isn’t perfect. Here are some open challenges:
🚀 I want you to pick up from here — implement these next steps, and share your code. Ping me on GitHub or Twitter. Let’s make the distributed systems world less mysterious, one key at a time.
🗃️ Code repo: https://github.com/architagr/The-Weekly-Golang-Journal/tree/main/consistent_hashing_distributed_cache
Stay Connected!
Helping People succeed in their IT Careers | Career Coach | Youtube: DiTechVerse | Senior Tech Lead | Ex-Infoscian | Connect with me for 1:1 Mentorship
1moIt’s an amazing read Archit Agarwal , keep sharing these type of articles
Full Stack Engineer | Go • TypeScript • React • Microservices • gRPC • Docker • K8s • Redis • NATS • PostgreSQL • MongoDB • Clean Code
1moWorth Reading, thanks Archit
SDE @Oracle | Distributed Systems & Scalable Systems | Problem Solver | Product Thinking
1moThanks for sharing, Archit Agarwal
Java Developer
1moCassandra jesa sound kar raha hay