Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left

  Apply Now

Oozie

3 / 4

Running Oozie Workflow From Command Line




Not able to play video? Try with youtube

As we know that the Oozie is a workflow manager, using Oozie we create a job that can run multiple tools of Hadoop such as Scoop, Flume, Hive etc.

As part of this exercise, we are going to learn how a typical job looks like and how to run a job.

These are the examples provided in Oozie documentation. We will be running one of the examples of Map-Reduce.

1. First, let us login to Web Console

If the Lab is available in the right side frame, please switch to the "Console" tab.

2. Copy Oozie examples to your home directory.

Please run the following Linux command to copy examples to the home directory in the web console: `cp /usr/hdp/current/oozie-client/doc/oozie-examples.tar.gz ./` 

Now, once this is copies to your home directory, you will see `oozie-examples.tar.gz` in the results of `ls` command.

3. Extract files from the tar - understand what's where

Please run the following command to extract the files: tar -zxvf oozie-examples.tar.gz

This will create a folder examples in your home directory. Let us change the directory to that using cd examples.

Now, let us try to walk through what is where. type ls to know the contents of examples folder. It would be having the following folders:

  • apps This contains the various workflows such as Sqoop, Spark, Hive, Pig and MapReduce. And we will be executing these. A workflow typically has compiled java code, Oozie XML script and configuration files.

  • input-data This contains the data that the various examples are using.

  • src This contains the source code of the apps. It gets compiled and bundled into a single jar file oozie-examples-XXXXX.jar and used in the apps.

  • ./input-data This contains the example data to be. Though there is no significant data, it serves as a good example. To take a quick look at the last 10 lines of the file, please check the output of: tail input-data/text/data.txt

Now, let us take a look at the map-reduce job. Please change the directory to the map-reduce folder using cd apps/map-reduce/ and run ls to see the various files. You will see the following list of files.

  • job-with-config-class.properties and job.properties

These two files are entry points. First one is one job and second is for another job. We will be using job.properties for our hands-on. Basically, we will pass the location of this file to Oozie client while launching the job.

The location of workflow XML that contains the actual Oozie workflow is mentioned in this config file. This config file contains one key-value pair of settings or configuration in it. You can just take a look the contents of the file using: cat job.properties

  • lib/

This folder contains the jar file that contains the classes having map-reduce logic. Take a look at the list of files in lib using ls lib. It should list a jar file oozie-examples-XXXXX.jar

  • workflow.xml and workflow-with-config-class.xml

These are the main workflow files. We will be using workflow.xml Take a look at the contents of this file using cat workflow.xml. It should look something like this:

```

<workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf">
<start to="mr-node"/>
<action name="mr-node">
    <map-reduce>
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <prepare>
            <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}"/>
        </prepare>
        <configuration>
            <property>
                <name>mapred.job.queue.name</name>
                <value>${queueName}</value>
            </property>
            <property>
                <name>mapred.mapper.class</name>
                <value>org.apache.oozie.example.SampleMapper</value>
            </property>
            ....
        </configuration>
    </map-reduce>
    <ok to="end"/>
    <error to="fail"/>
</action>
<kill name="fail">
    <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>

```

If you go through it you would understand that here we are defining the workflow - which commands to execute, in what order and what to do if it fails.

4. Edit Config File

Edit examples/apps/map-reduce/job.properties using nano and set the values in the above file to the following:

nameNode=hdfs://10.142.1.1:8020
jobTracker=10.142.1.2:8050
queueName=default
examplesRoot=examples
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce/workflow.xml
outputDir=map-reduce

Here nameNode refers to HDFS namenode which might be a different IP in your real-life project. jobTracker is basically resource manager's IP address. queueName and examplesRoot you can leave as such. Further, in a real-life project, you might have to use a different queueName as configured by your sysadmin.

5. Copy the examples directory to HDFS

hadoop fs -copyFromLocal ~/examples

6. Go to your home directory and run the job

cd ~

oozie job -oozie http://10.142.1.2:11000/oozie -config examples/apps/map-reduce/job.properties -run

7. Check the job status for the job_id printed in the previous step

oozie job -oozie http://10.142.1.2:11000/oozie -info job_id Wait fot the job to succeed. It might take about 2 minutes.

8. Understand the Map-Reduce Job and the End Result

If you go thru the workflow.xml, you will understand the following:

  • Mapper class is org.apache.oozie.example.SampleMapper. Check the code of the file using cat ~/examples/src/org/apache/oozie/example/SampleMapper.java It basically does nothing. It passes whatever it gets as key-value. By default, the mapper gets the byte-location of each line as key and the content of line as the value.

  • Reducer class is org.apache.oozie.example.SampleReducer. Check the code using: cat ~/examples/src/org/apache/oozie/example/SampleReducer.java By looking at Mapper and Reducer, it is obvious that it basically does nothing. So, it might just print the location of each line in the file. But with this example, the main objective is that you should be able to schedule your own Map-reduce job using Oozie.

  • The input folder is /user/${wf:user()}/${examplesRoot}/input-data/text

  • The output folder is /user/${wf:user()}/${examplesRoot}/output-data/${outputDir} . Looking at the jobs.properties file we infer that ${outputDir} is map-reduce. Therefore, it should create something in the HDFS at this location: examples/output-data/map-reduce

Let us take a look at HDFS to check if the folder and files were created: hadoop fs -ls examples/output-data/map-reduce. It should show something like this:

[sandeepgiri9034@cxln4 ~]$ hadoop fs -ls examples/output-data/map-reduce Found 2 items -rw-r--r-- 3 sandeepgiri9034 sandeepgiri9034 0 2020-11-06 11:27 examples/output-data/map-reduce/_SUCCESS -rw-r--r-- 3 sandeepgiri9034 sandeepgiri9034 1547 2020-11-06 11:27 examples/output-data/map-reduce/part-00000

The part-00000 contains the result. Let's check it out:

* hadoop fs -cat examples/output-data/map-reduce/part-00000*

The output would look something like this:

0   To be or not to be, that is the question;
42  Whether 'tis nobler in the mind to suffer
84  The slings and arrows of outrageous fortune,
129 Or to take arms against a sea of troubles,
172 And by opposing, end them. To die, to sleep;
217 No more; and by a sleep to say we end
255 The heart-ache and the thousand natural shocks
302 That flesh is heir to ? 'tis a consummation
346 Devoutly to be wish'd. To die, to sleep;
387 To sleep, perchance to dream. Ay, there's the rub,
438 For in that sleep of death what dreams may come,
487 When we have shuffled off this mortal coil,
531 Must give us pause. There's the respect
571 That makes calamity of so long life,
608 For who would bear the whips and scorns of time,

It basically printed the byte-count at which each line is starting.

Debugging jobs in Oozie

In order to list down all the jobs from all users, you can use the command:

oozie jobs -oozie http://10.142.1.2:11000/oozie

The result would look something like this:

Job ID                                   App Name     Status    User      Group     Started                 Ended
------------------------------------------------------------------------------------------------------------------------------------
0000003-201211110857173-oozie-oozi-W     map-reduce-wfSUCCEEDED praveen1058-         2020-12-12 06:33 GMT    2020-12-12 06:34 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000002-201211110857173-oozie-oozi-W     map-reduce-wfSUCCEEDED sandeepgiri9034-         2020-12-12 06:30 GMT    2020-12-12 06:30 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000001-201211110857173-oozie-oozi-W     map-reduce-wfSUCCEEDED sandeepgiri9034-         2020-12-12 06:28 GMT    2020-12-12 06:29 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000000-201211110857173-oozie-oozi-W     map-reduce-wfSUCCEEDED manujjoshi528582-         2020-12-12 02:48 GMT    2020-12-12 02:49 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000125-201011182341534-oozie-oozi-W     map-reduce-wfKILLED    manujjoshi528582-         2020-12-11 11:04 GMT    2020-12-11 11:20 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000124-201011182341534-oozie-oozi-W     map-reduce-wfKILLED    sandeepgiri9034-         2020-12-11 11:02 GMT    2020-12-11 11:20 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000123-201011182341534-oozie-oozi-W     map-reduce-wfSUCCEEDED sandeep   -         2020-12-11 10:56 GMT    2020-12-11 11:10 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000122-201011182341534-oozie-oozi-W     map-reduce-wfKILLED    sandeepgiri9034-         2020-12-11 10:51 GMT    2020-12-11 11:20 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000121-201011182341534-oozie-oozi-W     map-reduce-wfKILLED    manujjoshi528582-         2020-12-11 10:25 GMT    2020-12-11 11:20 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000120-201011182341534-oozie-oozi-W     map-reduce-wfKILLED    manujjoshi528582-         2020-12-11 10:18 GMT    2020-12-11 11:20 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000119-201011182341534-oozie-oozi-W     map-reduce-wfSUCCEEDED manujjoshi528582-         2020-12-11 10:14 GMT    2020-12-11 11:11 GMT
------------------------------------------------------------------------------------------------------------------------------------

In order to list down all the jobs from a user say 'sandeepgiri9034', you can use the command:

oozie jobs -oozie http://10.142.1.2:11000/oozie -filter user=sandeepgiri9034

If you want to kill all the jobs listed using the 'jobs' command, you can add '-kill' in the end.

oozie jobs -oozie http://10.142.1.2:11000/oozie -filter user=MYUSERNAME -kill

Note that this will kill all the jobs of the user 'MYUSERNAME'.

You can see the documentation of oozie here: Oozie Command line Documentation

Video

Script

Let’s run an Oozie job for MapReduce action. Login to CloudxLab Linux console. Copy Oozie examples to your home directory in the console. Extract files from tar. Edit examples/apps/map-reduce/job.properties and set the value of "namenode" and "jobtracker". We can find the Namenode host from Ambari under “HDFS” section.

We will be running examples/apps/map-reduce/workflow.xml in our job. Copy the examples directory to HDFS and run the job using the command displayed on the screen. cxln2.c.thelab-240901.internal:11000 is the host and port where the Oozie server is running.

Press enter. We will get the job id in the command prompt. To check the status of job type command displayed on the screen. Job-status is "Running".

Note: In step 4 you need to edit the file. This step is often missed.


Loading comments...