HDFS

6 / 12
HDFS - Namenode Backup & Failover

The metadata is maintained in the memory as well as on the disk. On the disk, it is kept in two parts: namespace image and edit logs.

The namespace image is created on demand while edit logs are created whenever there is a change in the metadata. So, at any point, to get the current state of the metadata, edit logs need to be applied on the image.

Since the metadata is huge, writing it to the disk on every change may be time consuming. There fore, saving just the change makes it extremely fast.

Without the namenode, the HDFS cannot be used at all. This is because we do not know which files are stored in which datanodes. Therefore it is very important to make the namenode resilient to failures. Hadoop provides various approaches to safeguard the namenode.

The first approach is to maintain a copy of the metadata on NFS - Network File System. Hadoop can be configured to do this. These modifications to the metadata happen either both on NFS and Locally or nowhere.

In the second approach to making the namenode resilient, we run a secondary namenode on a different machine.

The main role of the secondary namenode is to periodically merge the namespace image with edit logs to prevent the edit logs from becoming too large.

When a namenode fails, we have to first prepare the latest namespace image and then bring up the secondary namenode.

This approach is not good for production applications as there will be a downtime until the secondary namenode is brought online. With this method, the namenode is not highly available.

To make the namenode resilient, Hadoop 2 dot 0 added support for high availability.

This is done using multiple namenodes and zookeeper. Of these namenodes, one is active and the rest are standby namenodes. The standby namenodes are exact replicas of the active namenode.

The datanodes send block reports to both the active and the standby namenodes to keep all namenodes updated at any point-in-time.

If the active namenode fails, a standby can take over very quickly because it has the latest state of metadata.

zookeeper helps in switching between the active and the standby namenodes.

The namenode maintains the reference to every file and block in the memory. If we have too many files or folders in HDFS, the metadata could be huge. Therefore the memory of namenode may become insufficient.

To solve this problem, HDFS federation was introduced in Hadoop 2 dot 0.

In HDFS Federation, we have multiple namenodes containing parts of metadata.

The metadata - which is the information about files folders - gets distributed manually in different namenodes. This distribution is done by maintaining a mapping between folders and namenodes.

This mapping is known as mount tables.

In this diagram, the mount table is defining /mydata1 folder is in namenode1 and /mydata2 and /mydata3 are in namenode2

Mount table is not a service. It is a file kept along with and referred from the HDFS configuration file.

The client reads mount table to find out which folders belong to which namenode.

It routes the request to read or write a file to the namenode corresponding to the file's parent folder.

The same pool of datanodes is used for storing data from all namenodes.

Lets us discuss what is metadata. Following attributes get stored in metadata

  • List of files
  • List of Blocks for each file
  • List of DataNode for each block
  • File attributes, e.g. access time, replication factor, file size, file name, directory name

Transaction logs or edit logs store file creation and file deletion timestamps.