== Architecture ==
Zookeeper can run in two modes: Standalone and Replicated.
In standalone mode, it is just running on one machine and for practical purposes we do not use stanalone mode. This is only for testing purposes as it doesn't have high availability.
In production environments and in all practcal usecases, the replicated mode is used. In replicated mode, zookeeper runs on a cluster of machine which is called ensemble. Basically, zookeeper servers are installed on all of the machines in the cluster. Each zookeeper server is informed about all of the machines in the ensemble.
As soon as the zookeeper servers on all of the machines in ensemble are turned on, the phase 1 that is leader selection phase starts. This election is based on Paxos algorithm.
The machines in ensemble vote other machine based on the ping response and freshness of data. This way a distinguished member called leader is elected. The rest of the servers are termed as followers. Once all of the followers have synchronized their state with newly elected leader, the election phase finishes.
The election does not succeed if majority is not available to vote. Majority means more than 50% machines. Out of 20 machines, majority means 11 or more machines.
If at any point the leader fails, the rest of the machine or ensemble hold an election within 200 millseconds.
If the majority of the machines aren't available at any point of time, the leader automatically steps down.
The second phase is called Atomic Broadcast. Any request from user for writing, modification or deletion of data is redirected to leader by followers. So, there is always a single machine on which modifications are being accepted. The request to read data such as ls or get is catered by all of the machines.
Once leader has accepted a change from user, leader broadcasts the update to the followers - the other machines. [Check: This broadcasts and synchronization might take time and hence for some time some of the followers might be providing a little older data. That is why zookeeper provides eventual consistency no strict consistency.]
When majority have saved or persisted the change to disk, the leader commits the update and the client or users is sent a confirmation.
The protocol for achieving consensus is atomic similar to two phase commits. Also, to ensure the durability of change, the machines write to the disk before memory.
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Please login to comment
14 Comments
I have developed a distributed system where I have three nodes. Node-1 runs a python code which which agrregates the node-2 and node-3 work and note-3/node-4 are the actual worker node which does the compute. The actual code is lets say I am running a matching algoritm based on some rule(6 rules). Node-2 and Node-3 processing 3 rules each and Node-1 collectes the results of the workers nodes and create the actual matched record. This is a distributed setup.
Now I want to integrate zookeeper(highly available) here. Can you please explain the servers structures including zookeper.Also, how zookeeper will be included in my existsing python codes(mainly the node-2 code)
Upvote Sharemainly the node-1 code--corrected the relevant typo.
Upvote ShareServer Structure with ZooKeeper:
1. ZooKeeper Ensemble:
a. Deploy a multi-node ZooKeeper ensemble (odd number, typically 3-5) for high availability.
b. Each ZooKeeper node runs independently, maintaining a consistent view of shared data.
c. Majority (quorum) must be available for operations.
2. Integration with Nodes:
a. All nodes (Node-1, Node-2, Node-3) will connect to the ZooKeeper ensemble as clients.
b. They'll use ZooKeeper's API for coordination and synchronization tasks.
To interact with zookeeper using python, you can refer to https://cloudxlab.com/blog/how-to-interact-with-apache-zookeeper-using-python/
Upvote ShareHi Team,
Once in the video & in the text as well, it is being said that request is being redirected to leader by follower (i.e flow is like: follower---> leader) . But again after some point of time, it is being mentioned that leader accepts the request from user
(i.e flow is like user--->leader--->follower)
I am not able figure out the flow of request handling.
Upvote ShareIn the case of writing data, the follower will redirect the request to leader.
But in case of reads, the follower may also answer.
Upvote ShareHi Sir,
Thanks for your reply. But my query is the flow in case of read and write.
To whom the request will come first? Leader or Follower
And then the flow of further stages of request handling
Upvote ShareLet me explain the flow of request handling in simple terms:
1. A client sends a read or write request to any node in the ZooKeeper ensemble.
2. The receiving node checks whether it is the leader or a follower node.
3. If the node is the leader, it handles the write request directly by creating a new transaction and updating the data in its local copy of the ZooKeeper data tree. Once the transaction is complete, it sends a "leader-to-follower" message to all follower nodes to notify them of the update.
4. If the node is a follower, it forwards the write request to the leader to handle.
5. For read requests, any node can serve the request and return the requested data to the client.
6. When a client sends a read request to a node in the ensemble, that node checks if it has a local copy of the requested data. If it has the data, it serves the request directly by returning the data to the client. If it does not have the data, it forwards the request to any other node in the ensemble that has a copy of the data, and that node then responds to the request.
Upvote ShareWhen the Leader Steps down from Its Stage, Will it be allowed for Voting .
Eg. : There are 10 machines, one is leader and 9 are followers. Now the leader is step down due to network issues then will the 9 followers assemble and select the leader based on voting themself.
Also If the leader step down and issue rectified then, Will it be allowed to part of the cluster or only 9 machines.
Upvote ShareHi Karthikeyan,
It's an excellent question. The leader's election starts as soon as the leader steps down. So, the 9 followers will assemble and select the leader themselves. If the issue is rectified, that node(previous leader) will also be able to participate in the election process. But it will no longer resume leadership by default.
Upvote ShareThe cluster(ensemble) mentioned in the video, is it the actual hadoop cluster where the data is stored and processed? If not, whats the difference?
When I create znodes, are they created on the machines in the ensemble?
Upvote ShareZookeeper server is basically a software program that keeps running and listening on some port. We install Zookeeper on multiple machines and then make each one aware about the IP addresses of other machines and then they elect leader etc.
The ZNode mentioned is a concept like folder or database table. The Znode does not mean a node pf the cluster.
How often does the election of leader take place? is it a one time thing when the architecture is set up? or does it happen everytime servers are restarted ? Also what is the requirement of an election ? I mean can't we manually designate one as the leader ?
Upvote ShareThis comment has been removed.
Election happens whenever there is no leader which could happen:
1. In the beginning
2. When leader steps down
3. When leader shutdown or is no longer reachable