18 / 21
Yarn - More On Architecture

YARN - Advanced

What does the resource request made by client or application master have? The request may contain how many containers, how much memory and how many CPU are required. The request may also contain a constraint such as nodes nearer to certain data. This is called data locality constraint. It is similar to A mining company requesting for an office near the area which has mines. The request may either have demand for all of the resources up front or as and when required.

Another question that comes to mind is for how long does an application run? The application lifespan can vary dramatically. Lifespan is categorized into three categories: 1. First, One application per job. This is the simplest case. MapReduce is an example. In this, as soon as the job is over the application ends. 2. Second is One application per workflow or user session. Spark operates in this mode when you launch the interactive shell. So, it remains active during the user's session unless user terminates it. 3. The third one is a long-running application such as a server which runs forever. Examples are Apache Slider or Impala.

If you plan to build your own application, my suggestion would be to first try to use existing frameworks such as MapReduce or Spark.

Second, try to utilize existing tools to Build Jobs such as Apache Slider and Apache Twill. With Apache Slider you can run existing distributed Application such as HBase and Twill can execute any Java runnable.

If you still need to build your own YARN applications, please keep in mind that building from scratch is complex. To start with, you can use Yarn project's bundled distributed shell application example.