In the Hadoop ecosystem, Apache Zookeeper plays an important role in coordination amongst distributed resources. Apart from being an important component of Hadoop, it is also a very good concept to learn for a system design interview.
What is Apache Zookeeper?
Apache ZooKeeper is a coordination tool to let people build distributed systems easier. In very simple words, it is a central data store of key-value pairs, using which distributed systems can coordinate. Since it needs to be able to handle the load, Zookeeper itself runs on many machines.
Zookeeper provides a simple set of primitives and it is very easy to program.
It is used for:
- maintaining configuration
- failover management.
It does not suffer from Race Conditions and Dead Locks.
Note: This blog assumes that you are already familiar with basics of Apache Zookeeper. If you aren't, then please go through the following blog first:
Now, let’s come back to our topic, that is:
How to interact with Apache Zookeeper?
We interact with Apache Zookeeper through clients. A client is any process that connects to the ZooKeeper ensemble using the ZooKeeper client API. It establishes a session with the ZooKeeper service by creating a handle to the service using a language binding. Apache ZooKeeper ships with API bindings for Java and C by default. But, we can also interact with Zookeeper using Python. It has been made possible by the python library Kazoo. Kazoo implements a higher level API to Apache Zookeeper for Python clients.
Using Zookeeper in a safe manner can be difficult due to the variety of edge-cases in Zookeeper and other bugs that have been present in the Python C binding. Due to how the C library utilizes a separate C thread for Zookeeper communication some libraries like gevent (or eventlet) also don’t work properly by default.
There were several issues By utilizing a pure Python implementation, Kazoo handles all of these cases and provides a new asynchronous API which is consistent when using threads or gevent (or eventlet) greenlets.
How to use Kazoo?
Kazoo makes the connection process much simple and hassle-free. Let’s learn how to use it.
It is recommended to create a virtual environment to avoid conflicts between the packages. It lets you have a stable, reproducible, and portable environment. So, we will start by creating a virtual environment and installing Kazoo in it.
# Create virtual environment virtualenv zk #activate it source zk/bin/activate # Install python client pip3 install kazoo # Launch python prompt python3
To begin using Kazoo, a KazooClient object must be created and a connection established:
from kazoo.client import KazooClient zk = KazooClient(hosts='127.0.0.1:2181') zk.start()
By default, the client will connect to a local Zookeeper server on the default port (2181). You should make sure Zookeeper is actually running there first, or the start command will be waiting until its default timeout.
Once connected, the client will attempt to stay connected regardless of intermittent connection loss or Zookeeper session expiration.
Now, let’s check whether it’s working or not. We will write a function that can be triggered either when the node has changed or when the children of the node change.
def my_func(event): # check to see what the children are now print("event: ", event) zk.get_children("/cloudxlab", watch=my_func) # Call my_func when the children change children = zk.get_children("/cloudxlab/", watch=my_func)
Now in another terminal let’s try creating znodes inside /cloudxlab using
zookeeper-client. So, when I created a znode inside /cloudxlab, the following output was displayed automatically in the python terminal:
>> ('event: ', WatchedEvent(type='CHILD', state='CONNECTED', path=u'/cloudxlab'))
Hence, it’s working fine. You are also advised to try it out.
So, today we have learned how to use Kazoo to Interact with Apache Zookeeper. That’s all for this blog. Happy Learning!!