SlideShare a Scribd company logo
Winter is coming?
Not if ZooKeeper is
there!
Presented By : Joydeep Banik Roy
Sr. Software Engineer
Cerner Corporation
1
Winter is coming?2
Distributed System3
“A distributed system is capable of exploiting the capacity
of multiple processors by running components, perhaps
replicated, in parallel. A system might be distributed
geographically for strategic reasons, such as the presence of
servers in multiple locations participating in a single
application.”
- ZooKeeper
Distributed Process Coordination, O’Reilly
Fallacies of the Distributed System
o The network is reliable.
o Latency is zero.
o Bandwidth is infinite.
o The network is secure.
o Topology doesn't change.
o There is one administrator.
o Transport cost is zero.
o The network is homogeneous.
4
Coordination
 A coordination task is a task involving multiple processes for
the purposes of cooperation or to regulate contention.
 Examples:
Master Election
Crash detection
Group membership management
Metadata management
5
What is ZooKeeper?
“Distributed, open-source coordination service for distributed applications
that exposes a simple API, like a file system API, that applications can build
upon to implement higher level services for synchronization, configuration
maintenance, and groups and naming.”
6
/master
“richman.com”
/worker
•/worker/worker-1
“poorman.com”
/tasks
•/tasks/task-1 “poor-to-
rich.sh”
How it does : Shared Storage
Server 1 Server 2 Server 3
Server 4
(Leader)
Server 5
7
Client
Library
Client
Library
Client
Library
Client
Library
APPLICATION
/master
“richman.com”
/worker
•/worker/worker-1
“poorman.com”
/tasks
•/tasks/task-1 “run-cmd”
The ZooKeeper Data Model
ZooKeeper has a hierarchal name space.
Each node in the namespace is called as a ZNode.
Every ZNode has data (given as byte[]) and can have children.
parent : “/zookeeper"
|-- child1 : “/master"
|-- child2 : “/workers"
|-- child3 : “/tasks"
`-- task-1 : “run cmd;"
ZNode properties:
Maintains a stat structure with version
numbers for data changes, ACL changes
and timestamps
Version number increases with changes
Data is read and written in its entirety
8
Znode Example: Simple Lock9
/resource
Process1 Process2 Process3
/Lock ”PROCESS1”
Znode Example: Simple Lock10
/Resource
Process2 Process3
/Lock ”PROCESS2”
Znode Example: Simple Lock11
/Resource
Process3
/Lock ”PROCESS3”
ZNODE Types
Persistent
exists till deleted
explicitly.
12
Ephemeral
deleted once the
client session ends.
Sequential
appends a
monotonically
increasing counter to
the end of path.
Watches and Notifications
 Event – Execution of update to a znode
 Watch – one time trigger associated with a znode
 Notification – When a watch is triggered by an event it generates a
notification
13
“ZooKeeper always pays its debts”14
“One important guarantee of notifications is that they are
delivered to a client before any
other change is made to the same znode”
ZooKeeper Guarantees
 Sequential Consistency - Updates from a client will be applied in the order that
they were sent.
 Atomicity - Updates either succeed or fail. No partial results.
 Single System Image - A client will see the same view of the service regardless of
the server that it connects to.
 Reliability - Once an update has been applied, it will persist from that time
forward until a client overwrites the update.
 Timeliness - The clients view of the system is guaranteed to be up-to-date within a
certain time bound. Rather than watching stale data, a server will shut down and
forse client to connect to another one with more recent image.
15
ZooKeeper is Simple
ZNODE OPERATIONS (API)
16
READ WRITE
getACL setACL
exists create
getChildren delete
getData setData
SYNC() call
Example : Master-Worker17
/master
/assign
/task
/worker /worker-1
/worker-1
/task-1
/task-1
/status DONE
ZooKeeper Recipes18
● Configuration management – machines bootstrap config from a
centralized source, facilitates simpler deployment/provisioning
● Naming service - like DNS, mappings of names to addresses
● Distributed synchronization - locks, barriers, queues
● Leader election - a common problem in distributed coordination
● Centralized and highly reliable (simple) data registry
Recipe #1 : Barriers
 Used for Configuration management
 The clients want to read a configuration but the configuration is not yet ready.
 Barrier blocks the processing of a set of nodes till a condition is met. Therefore a
/barrier znode is created.
 Client calls the ZooKeeper API's exists() function on the barrier node,
with watch set to true.
 If exists() returns false, the barrier is gone and the client proceeds
 Else, if exists() returns true, the clients wait for a watch event from ZooKeeper for
the barrier node.
19
Recipe #2 : Distributed Exclusive Lock
Assuming there are N clients trying to acquire a lock
 Clients creates an ephemeral, sequential znode under the path
/Cluster/_locknode_
 Clients requests a list of children for the lock znode (i.e. _locknode_)
 The client with the least ID according to natural ordering will hold the
lock.
 Other clients sets watches on the znode with id immediately preceding
its own id. This is done to avoid “The Herd Effect”.
 Periodically checks for the lock in case of notification.
 The client wishing to release a lock deletes the node, which triggering the
next client in line to acquire the lock.
20
ZK
|---Cluster
+---hadoopConfig
+---memberships
+---_locknode_
+---host1-HiveClient
+---host2-Impala
+---host3-YARN
+--- …
---hostN-Crunch
Recipe #3 : Leader Election21
 A znode, say “/leader/election-path"
 All participants of the election process create an ephemeral-sequential node on the same election
path.
 The node with the smallest sequence number is the leader.
 Each “follower” node listens to the node with the next lower seq. number
 Upon leader removal go to election-path and find a new leader or become the leader if it has the
lowest sequence number.
 Upon session expiration check the election state and go to election if needed.
 Applications may consider creating a separate znode to acknowledge that the leader has executed
the leader procedure.
Recipe #4 : Distributed Queue22
 A znode /queue is created.
 Distributed clients create EPHEMERAL-SEQUENTIAL znodes by passing path name ending in
/queue- to create()
 Pathnames have the form /queue/queue-X where X is monotonically increasing number.
 If a single consumer takes items out of the queue, they will be ordered FIFO.
 The client calls getChildren() and process all queue nodes until exhausted. Guaranteed to not miss
anything as the nodes are ordered FIFO
 Priority Queues come with a small change.
Apache Curator
 Lot more recipes available and open sourced by NetFlix.
 Visit http://curator.apache.org/ for more recipes and their implementation.
23
Language Bindings
ZooKeeper ships client libraries in:
Java
C
Perl
Python
Community contributed client bindings available for
Scala, C#, Node.js, Ruby, ErLang, Go, Haskell
https://cwiki.apache.org/ZOOKEEPER/zkclientbindings.html
24
Who uses ZooKeeper?25
References
 ZooKeeper : Distributed Process Coordination
By Flavio Junqueira and Benjamin Reed
 https://zookeeper.apache.org/ It has some fabulous documentation!
 http://curator.apache.org/ Check out the recipes!
 Some really generous slides on slideshare like this one :
 http://www.slideshare.net/sauravhaloi/introduction-to-apache-zookeeper
 And others…
26
Questions
27
DON’T FORGET TO RATE THIS TALK
THANK YOU
28

More Related Content

PDF
Apache ZooKeeper
PPTX
Apache zookeeper seminar_trinh_viet_dung_03_2016
PPTX
Introduction to apache zoo keeper
PDF
Apache Zookeeper
PDF
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
PPTX
Introduction to Apache ZooKeeper
PDF
Distributed system coordination by zookeeper and introduction to kazoo python...
PPTX
Meetup on Apache Zookeeper
Apache ZooKeeper
Apache zookeeper seminar_trinh_viet_dung_03_2016
Introduction to apache zoo keeper
Apache Zookeeper
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache ZooKeeper
Distributed system coordination by zookeeper and introduction to kazoo python...
Meetup on Apache Zookeeper

What's hot (20)

PDF
Apache ZooKeeper TechTuesday
PPT
Zookeeper Introduce
PPTX
Zookeeper big sonata
PDF
Introduction to Apache ZooKeeper
PDF
使用ZooKeeper打造軟體式負載平衡
PPTX
So we're running Apache ZooKeeper. Now What? By Camille Fournier
PPTX
Apache zookeeper 101
PDF
Zookeeper In Action
KEY
Curator intro
PDF
zookeeperProgrammers
PDF
A Python Petting Zoo
PPTX
Centralized Application Configuration with Spring and Apache Zookeeper
PPTX
Developing distributed applications with Akka and Akka Cluster
PDF
Distributed Coordination with Python
PPTX
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
PDF
Openstack meetup lyon_2017-09-28
PDF
Python twisted
PDF
Supercharging Content Delivery with Varnish
PDF
NATS: Simple, Secure and Scalable Messaging For the Cloud Native Era
PDF
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
Apache ZooKeeper TechTuesday
Zookeeper Introduce
Zookeeper big sonata
Introduction to Apache ZooKeeper
使用ZooKeeper打造軟體式負載平衡
So we're running Apache ZooKeeper. Now What? By Camille Fournier
Apache zookeeper 101
Zookeeper In Action
Curator intro
zookeeperProgrammers
A Python Petting Zoo
Centralized Application Configuration with Spring and Apache Zookeeper
Developing distributed applications with Akka and Akka Cluster
Distributed Coordination with Python
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Openstack meetup lyon_2017-09-28
Python twisted
Supercharging Content Delivery with Varnish
NATS: Simple, Secure and Scalable Messaging For the Cloud Native Era
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
Ad

Viewers also liked (19)

PDF
Introduction to ZooKeeper - TriHUG May 22, 2012
PDF
Zookeeper
PDF
ZooKeeper - wait free protocol for coordinating processes
PDF
Dynamic Reconfiguration of Apache ZooKeeper
PPTX
Distributed Applications with Apache Zookeeper
PDF
Apache Zookeeper 分布式服务框架
PPTX
Introduction to Kafka and Zookeeper
PDF
zookeeper-internals
PPTX
Groovy to gradle
PDF
ZooKeeper Futures
PDF
Taming Pythons with ZooKeeper
PPTX
ZooKeeper (and other things)
PDF
Taming Pythons with ZooKeeper (Pyconfi edition)
PDF
ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0
PDF
Overview of Zookeeper, Helix and Kafka (Oakjug)
PDF
Jcconf 2016 zookeeper
PDF
Zookeeper
PPTX
Curation
Introduction to ZooKeeper - TriHUG May 22, 2012
Zookeeper
ZooKeeper - wait free protocol for coordinating processes
Dynamic Reconfiguration of Apache ZooKeeper
Distributed Applications with Apache Zookeeper
Apache Zookeeper 分布式服务框架
Introduction to Kafka and Zookeeper
zookeeper-internals
Groovy to gradle
ZooKeeper Futures
Taming Pythons with ZooKeeper
ZooKeeper (and other things)
Taming Pythons with ZooKeeper (Pyconfi edition)
ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0
Overview of Zookeeper, Helix and Kafka (Oakjug)
Jcconf 2016 zookeeper
Zookeeper
Curation
Ad

Similar to Winter is coming? Not if ZooKeeper is there! (19)

PDF
ZooKeeper Recipes and Solutions
PDF
ZooKeeper Recipes and Solutions
PDF
ZooKeeper Recipes and Solutions
PPTX
Zookeeper Architecture
PDF
Tech Talks_25.04.15_Session 3_Tibor Sulyan_Distributed coordination with zook...
PPTX
Zookeeper
PPTX
Zookeeper Tutorial for beginners
PPTX
PPTX
Leo's Notes about Apache Kafka
PPTX
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API Examples
PPTX
ZeroMq ZooKeeper and FlatBuffers
PDF
SVCC-2014
PDF
Базы данных. ZooKeeper
PPTX
Zoo keeper in the wild
PDF
ZooKeeper Partitioning - A project report
PPTX
Zookeeper
PDF
ClickHouse Keeper
PDF
Coordination in distributed systems
PPTX
Comparison between zookeeper, etcd 3 and other distributed coordination systems
ZooKeeper Recipes and Solutions
ZooKeeper Recipes and Solutions
ZooKeeper Recipes and Solutions
Zookeeper Architecture
Tech Talks_25.04.15_Session 3_Tibor Sulyan_Distributed coordination with zook...
Zookeeper
Zookeeper Tutorial for beginners
Leo's Notes about Apache Kafka
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API Examples
ZeroMq ZooKeeper and FlatBuffers
SVCC-2014
Базы данных. ZooKeeper
Zoo keeper in the wild
ZooKeeper Partitioning - A project report
Zookeeper
ClickHouse Keeper
Coordination in distributed systems
Comparison between zookeeper, etcd 3 and other distributed coordination systems

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
A Presentation on Artificial Intelligence
PDF
Encapsulation theory and applications.pdf
PPTX
Cloud computing and distributed systems.
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Electronic commerce courselecture one. Pdf
Chapter 3 Spatial Domain Image Processing.pdf
NewMind AI Weekly Chronicles - August'25 Week I
A Presentation on Artificial Intelligence
Encapsulation theory and applications.pdf
Cloud computing and distributed systems.
Understanding_Digital_Forensics_Presentation.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Building Integrated photovoltaic BIPV_UPV.pdf
Review of recent advances in non-invasive hemoglobin estimation
Agricultural_Statistics_at_a_Glance_2022_0.pdf
NewMind AI Monthly Chronicles - July 2025
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Modernizing your data center with Dell and AMD
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
The AUB Centre for AI in Media Proposal.docx
Per capita expenditure prediction using model stacking based on satellite ima...

Winter is coming? Not if ZooKeeper is there!

  • 1. Winter is coming? Not if ZooKeeper is there! Presented By : Joydeep Banik Roy Sr. Software Engineer Cerner Corporation 1
  • 3. Distributed System3 “A distributed system is capable of exploiting the capacity of multiple processors by running components, perhaps replicated, in parallel. A system might be distributed geographically for strategic reasons, such as the presence of servers in multiple locations participating in a single application.” - ZooKeeper Distributed Process Coordination, O’Reilly
  • 4. Fallacies of the Distributed System o The network is reliable. o Latency is zero. o Bandwidth is infinite. o The network is secure. o Topology doesn't change. o There is one administrator. o Transport cost is zero. o The network is homogeneous. 4
  • 5. Coordination  A coordination task is a task involving multiple processes for the purposes of cooperation or to regulate contention.  Examples: Master Election Crash detection Group membership management Metadata management 5
  • 6. What is ZooKeeper? “Distributed, open-source coordination service for distributed applications that exposes a simple API, like a file system API, that applications can build upon to implement higher level services for synchronization, configuration maintenance, and groups and naming.” 6
  • 7. /master “richman.com” /worker •/worker/worker-1 “poorman.com” /tasks •/tasks/task-1 “poor-to- rich.sh” How it does : Shared Storage Server 1 Server 2 Server 3 Server 4 (Leader) Server 5 7 Client Library Client Library Client Library Client Library APPLICATION
  • 8. /master “richman.com” /worker •/worker/worker-1 “poorman.com” /tasks •/tasks/task-1 “run-cmd” The ZooKeeper Data Model ZooKeeper has a hierarchal name space. Each node in the namespace is called as a ZNode. Every ZNode has data (given as byte[]) and can have children. parent : “/zookeeper" |-- child1 : “/master" |-- child2 : “/workers" |-- child3 : “/tasks" `-- task-1 : “run cmd;" ZNode properties: Maintains a stat structure with version numbers for data changes, ACL changes and timestamps Version number increases with changes Data is read and written in its entirety 8
  • 9. Znode Example: Simple Lock9 /resource Process1 Process2 Process3 /Lock ”PROCESS1”
  • 10. Znode Example: Simple Lock10 /Resource Process2 Process3 /Lock ”PROCESS2”
  • 11. Znode Example: Simple Lock11 /Resource Process3 /Lock ”PROCESS3”
  • 12. ZNODE Types Persistent exists till deleted explicitly. 12 Ephemeral deleted once the client session ends. Sequential appends a monotonically increasing counter to the end of path.
  • 13. Watches and Notifications  Event – Execution of update to a znode  Watch – one time trigger associated with a znode  Notification – When a watch is triggered by an event it generates a notification 13
  • 14. “ZooKeeper always pays its debts”14 “One important guarantee of notifications is that they are delivered to a client before any other change is made to the same znode”
  • 15. ZooKeeper Guarantees  Sequential Consistency - Updates from a client will be applied in the order that they were sent.  Atomicity - Updates either succeed or fail. No partial results.  Single System Image - A client will see the same view of the service regardless of the server that it connects to.  Reliability - Once an update has been applied, it will persist from that time forward until a client overwrites the update.  Timeliness - The clients view of the system is guaranteed to be up-to-date within a certain time bound. Rather than watching stale data, a server will shut down and forse client to connect to another one with more recent image. 15
  • 16. ZooKeeper is Simple ZNODE OPERATIONS (API) 16 READ WRITE getACL setACL exists create getChildren delete getData setData SYNC() call
  • 17. Example : Master-Worker17 /master /assign /task /worker /worker-1 /worker-1 /task-1 /task-1 /status DONE
  • 18. ZooKeeper Recipes18 ● Configuration management – machines bootstrap config from a centralized source, facilitates simpler deployment/provisioning ● Naming service - like DNS, mappings of names to addresses ● Distributed synchronization - locks, barriers, queues ● Leader election - a common problem in distributed coordination ● Centralized and highly reliable (simple) data registry
  • 19. Recipe #1 : Barriers  Used for Configuration management  The clients want to read a configuration but the configuration is not yet ready.  Barrier blocks the processing of a set of nodes till a condition is met. Therefore a /barrier znode is created.  Client calls the ZooKeeper API's exists() function on the barrier node, with watch set to true.  If exists() returns false, the barrier is gone and the client proceeds  Else, if exists() returns true, the clients wait for a watch event from ZooKeeper for the barrier node. 19
  • 20. Recipe #2 : Distributed Exclusive Lock Assuming there are N clients trying to acquire a lock  Clients creates an ephemeral, sequential znode under the path /Cluster/_locknode_  Clients requests a list of children for the lock znode (i.e. _locknode_)  The client with the least ID according to natural ordering will hold the lock.  Other clients sets watches on the znode with id immediately preceding its own id. This is done to avoid “The Herd Effect”.  Periodically checks for the lock in case of notification.  The client wishing to release a lock deletes the node, which triggering the next client in line to acquire the lock. 20 ZK |---Cluster +---hadoopConfig +---memberships +---_locknode_ +---host1-HiveClient +---host2-Impala +---host3-YARN +--- … ---hostN-Crunch
  • 21. Recipe #3 : Leader Election21  A znode, say “/leader/election-path"  All participants of the election process create an ephemeral-sequential node on the same election path.  The node with the smallest sequence number is the leader.  Each “follower” node listens to the node with the next lower seq. number  Upon leader removal go to election-path and find a new leader or become the leader if it has the lowest sequence number.  Upon session expiration check the election state and go to election if needed.  Applications may consider creating a separate znode to acknowledge that the leader has executed the leader procedure.
  • 22. Recipe #4 : Distributed Queue22  A znode /queue is created.  Distributed clients create EPHEMERAL-SEQUENTIAL znodes by passing path name ending in /queue- to create()  Pathnames have the form /queue/queue-X where X is monotonically increasing number.  If a single consumer takes items out of the queue, they will be ordered FIFO.  The client calls getChildren() and process all queue nodes until exhausted. Guaranteed to not miss anything as the nodes are ordered FIFO  Priority Queues come with a small change.
  • 23. Apache Curator  Lot more recipes available and open sourced by NetFlix.  Visit http://curator.apache.org/ for more recipes and their implementation. 23
  • 24. Language Bindings ZooKeeper ships client libraries in: Java C Perl Python Community contributed client bindings available for Scala, C#, Node.js, Ruby, ErLang, Go, Haskell https://cwiki.apache.org/ZOOKEEPER/zkclientbindings.html 24
  • 26. References  ZooKeeper : Distributed Process Coordination By Flavio Junqueira and Benjamin Reed  https://zookeeper.apache.org/ It has some fabulous documentation!  http://curator.apache.org/ Check out the recipes!  Some really generous slides on slideshare like this one :  http://www.slideshare.net/sauravhaloi/introduction-to-apache-zookeeper  And others… 26
  • 28. DON’T FORGET TO RATE THIS TALK THANK YOU 28

Editor's Notes

  • #3: What I mean by Winter are all the problems that we face when we work with distributed systems like the faults, failures, crashes, concurrency issues, synchronization issues that we face in distributed systems.
  • #4: Earlier Life was simpler when most applications were a single program running on a single computer with a single CPU Today, things have changed. In the Big Data and Cloud Computing world, applications are made up of many independent programs running on an ever-changing set of computers. Distributed systems have brought their own set of challenges along with them. Like Concurrency. Security. Scalability. Resilience to failure But as an application developer I should not be concerned about these. That brings me to this very interesting slide that talks about the fallacies of DS.
  • #5: These are the assumptions that most the programmers make while developing or coding for distributed applications. And we all know none of them is true…
  • #6: To tackle these problems, we need coordination among the distributed components. Coordination can be for the purposes of cooperation or to regulate contention. For example: ………… master worker example for coordination Now let me tell you distributed coordination is really hard. So who should take care of these coordination needs. Who should save us from the winter? This is where zookeeper comes to our rescue.
  • #7: Definition So it can be used for all coordination needs that we talked about yet in a simple way. Separates Coordination tasks from application. Hence: coordination task , loosely coupled Makes apps simple to maintain Makes development of apps independent to design and develop How it does this?
  • #8: 2 ways of communication in DS. Shared storage and message passing Provides shared storage over ensemble of servers. Client API: Java and C Application use these Client apis to access ZooKeeper Service ZooKeeper Quorums – Minimum no. of servers that replicate data tree before sending acknowledgement to client. A leader is elected out of the quorum Clients connect to one of the servers through a TCP connection and maintain a session All servers have a copy of the data tree and ones which do not eventually catch up Read from any ZooKeeper Server Write goes through the leader and voting happens between members of quorum called followers Observers do not vote but just commit updates A Client connects to only one zookeeper server at a time using a TCP connection and establishes a Session. After establishing session, clients use these apis to create and manipulate Znodes. So what is znodes? It is a simple file system like structure which can store coordination data
  • #9: Znode – special data structure, can store data as well as children. Like a file as well as directory. Clients can create, update and delete these znodes and their children. They can also update the data of a znode. Also clients can read the data of a znode and check whether znode exists. Using these operation, coordination can be achieved between multiple clients. For example how can we achieve a lock on a resource using znodes?
  • #10: A Resource in you application is represented by /resource znode in ZooKeeper ensemble. Any process looking to access the resource would have to first acquire a lock on it which means create a znode /lock processName. Other processes keep on checking the /resource znode to find out if a lock znode already exists. If it exists, then they may want to read who/which process has the lock. If not, then they may themselves create a /lock znode
  • #11: Once the process having the lock, finishes accessing the resource or its session expires then the /lock znode is deleted by ZooKeeper ensemble. Other processes can now acquire the lock. ZooKeeper does not delete the lock arbitrarily but because the znode /lock is an ephemeral znode, which is destined to be deleted.
  • #13: There are 4 types of Znodes based on their properties: Persisitent – Continues to live on the ensemble even after creating node dies. In previous example, /resource is a candidate for persistent znode. Ephemeral – Deleted once the client session has ended, either explicitly or due to expiry. In previous example, /lock is a candidate for ephemeral znode. Persistent Sequential – Is a persistent znode with a monotonically increasing number appended to its name. Ephemeral Sequential – Is an ephemeral znode with a monotonically increasing number appended to its name like /lock-1,/lock-2 to show , maybe, order in which the different processes acquired the locks.
  • #14: Now instead of 3 processes, there can be thousands of these processes waiting to acquire the lock. All these processes would be polling the ZooKeeper ensemble to check whether lock exists or not. This can cause a spike in server activity in which most of its time would be spent in fulfilling the client requests of exists() call. Due to this a client can set a one time trigger called watch on a znode when the following happens: Znode is created Znode is deleted Data of a znode changes Znode Children changes A watch is triggered at most once and generates a notification to the client who set the watch on matching the condition of watch
  • #15: Like a Lannister, the ZooKeeper always pays its debts! It means that once you set a watch on a znode and an update on the znode happens, you are guaranteed to receive a notification before the next update happens on the znode. It does not mean that the clients will see all changes because there can be a time lag between the client getting the notification and reading the znode. By that time there can be multiple updates to the znode, for which no watch was set and a watch being a one-time trigger, will only service client once after it was set and when the first update happens. In any case, client will see the latest state when reading from zookeeper ensemble and may set a watch optionally.
  • #17: sync call – waits for data to be propagated
  • #18: Working of a Master-Worker system to process some tasks submitted by a client. Remember all znodes are conceptual representations and master, worker, task and assign are at same level under /zookeepermain znode. These are a group of systems working together to accomplish some task and managing their distributed coordination need using the znode naming system. Master, workers and tasks are all created by different clients to manage the processing. These are steps in order of appearance of the animation: A client creates /master ephemeral znode and keeps running to process client task completion requests. Others Clients who double as both worker and backup master may set a watch on /master znode to receive notification when master goes down. The master creates three persistent znodes namely /worker, /task and /assign to represent the worker queues, task queues and assignment of a task to a worker. The master sets a watch each on /worker and /task to listen for new workers and tasks. Once a worker becomes available, it creates a persistent znode under /assign to be assigned task to and an ephemeral znode under /worker to tell zookeeper that it is available to take up a task. Remember, the znode under /assign is created first so that as soon as it makes itself available, zookeeper will search under /assign to allocate a task. If not found, zookeeper will not be able to assign even if a worker is available. Next worker client sets a watch on /assign/worker-1 to listen for any changes under it viz. task assignment. Once /worker/worker-1 is created zookeeper will get notified and will check /task to assign one based on availability. Now a client creates a new /task-1 under /task and sets a watch on /status to watch for task completion. Zookeeper gets notification of new task, check for available workers and assigns by creating znode /task-1 under /assign/worker-1 Now worker gets a notification, since it set a watch on /assign/worker-1, and finds the task details under /task After completing task, worker creates a znode /status with data “DONE”/any other failure msg under /task/task-1 Client watching on the status now receives a notification and reads status to find task is completed or error message.
  • #23: Two Small Changes in Priority Queues: Add to the /queue with pathname /queue-YY where YY is the priority of the element with lower numbers representing higher priority. When removing elements from queue, a client uses an up-to-date children list meaning that the client that the client will invalidate previously obtained children lists if a watch notification triggers for the queue node.