YARN - Evolution from MR 1.0
Before YARN, it was MapReduce 1.0 that was responsible for distributing the work. This is how MapReduce 1.0 works. It is made up of a Job Tracker and many task trackers. Job Tracker is like a manager of a shopfloor who is responsible for interacting with customers and get the work done via Task Trackers. Job tracker breaks down the work and distributes parts to various task trackers. The task trackers keep Job Tracker updated with the latest status. If a task tracker fails to provide the status back, Job Tracker assumes that the task tracker is dead. Thereafter Job Tracker assigns work to some other task tracker.
To ensure equal load on all task trackers, Job Tracker keeps track of the resources and tasks.
MapReduce 1.0 also performed the sorting or ordering required as part of Map-Reduce framework.
But MapReduce framework was very restrictive - the only way you could get your work done was by using MapReduce framework. Not all problems were suitable for MapReduce kind of model. Some of the problems can be solved better using other frameworks. That's why YARN came into play.
Basically, Map-Reduce 1.0 was split into two big components - YARN and MapReduce 2.0. YARN is only responsible for managing and negotiating resources on cluster and MapReduce 2.0 has only the computation framework also called workfload which run the logic into two parts - map and reduce. MapReduce 2.0 also does the sorting of the data.
This refactoring or splitting made way for many other frameworks for solving different kind of problems such as Tez, HBase, Storm, Giraph, Spark, OpenMPI etcetera
The advantages of YARN are: 1. It supports many workfloads including MapReduce. 2. Now, with YARN, it became easier to scale up. 3. The MapReduce 2.0 was compatible with MapReduce 1.0. The program written for MapReduce 1.0 need not be modified. Just recompilation was enough. 4. This improved the cluster utilization as different kinds of workloads were possible on the same cluster. 5. It improved Agility. 6. Since map-reduce was batch oriented, it was not possible to run tasks that needed to be run forever such as stream processing jobs.
The role of Job Tracker in MapReduce 1 is now split into multiple components in yarn - Resource Manager, Application Master, Timeline Server. The task tracker is now Node Manager. The role of a slot in MapReduce 1 is now played by Container in YARN.
Taking you to the next exercise in seconds...