How to Interact with Apache Zookeeper using Python?

In the Hadoop ecosystem, Apache Zookeeper plays an important role in coordination amongst distributed resources. Apart from being an important component of Hadoop, it is also a very good concept to learn for a system design interview.

What is Apache Zookeeper?

Apache ZooKeeper is a coordination tool to let people build distributed systems easier. In very simple words, it is a central data store of key-value pairs, using which distributed systems can coordinate. Since it needs to be able to handle the load, Zookeeper itself runs on many machines.

Zookeeper provides a simple set of primitives and it is very easy to program.

It is used for:

  • synchronization
  • locking
  • maintaining configuration
  • failover management.

It does not suffer from Race Conditions and Dead Locks.

Continue reading “How to Interact with Apache Zookeeper using Python?”

How does YARN interact with Zookeeper to support High Availability?

In the Hadoop ecosystem, YARN, short for Yet Another Resource negotiator, holds the responsibility of resource allocation and job scheduling/management. The Resource Manager(RM), one of the components of YARN, is primarily responsible for accomplishing these tasks of coordinating with the various nodes and interacting with the client.

To learn more about YARN, feel free to visit here.

Architecture of YARN

Hence, Resource Manager in YARN is a single point of failure – meaning, if the Resource Manager is down for some reason, the whole of the system gets disturbed due to interruption in the resource allocation or job management, and thus we cannot run any jobs on the cluster. 

To avoid this issue, we need to enable the High Availability(HA) feature in YARN. When HA is enabled, we run another Resource Manager parallelly on another node, and this is known as Standby Resource Manager. The idea is that, when the Active Resource Manager is down, the Standby Resource Manager becomes active, and ensures smooth operations on the cluster. And the process continues.

Continue reading “How does YARN interact with Zookeeper to support High Availability?”

Introduction to Apache Zookeeper

In the Hadoop ecosystem, Apache Zookeeper plays an important role in coordination amongst distributed resources. Apart from being an important component of Hadoop, it is also a very good concept to learn for a system design interview.

If you would prefer the videos with hands-on, feel free to jump in here.

Alright, so let’s get started.

Goals

In this post, we will understand the following:

  • What is Apache Zookeeper?
  • How Zookeeper achieves coordination?
  • Zookeeper Architecture
  • Zookeeper Data Model
  • Some Hands-on with Zookeeper
  • Election & Majority in Zookeeper
  • Zookeeper Sessions
  • Application of Zookeeper
  • What kind of guarantees does ZooKeeper provide?
  • Operations provided by Zookeeper
  • Zookeeper APIs
  • Zookeeper Watches
  • ACL in Zookeeper
  • Zookeeper Usecases
Continue reading “Introduction to Apache Zookeeper”

Distributed Computing with Locks

Introduction

Having known of the prevalence of BigData in real-world scenarios, it’s time for us to understand how they work. This is a very important topic in understanding the principles behind system design and coordination among machines in big data. So let’s dive in.

Scenario:

Consider a scenario where there is a resource of data, and there is a worker machine that has to accomplish some task using that resource. For example, this worker is to process the data by accessing that resource. Remember that the data source is having huge data; that is, the data to be processed for the task is very huge.

Continue reading “Distributed Computing with Locks”