Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left
Apply NowLogin using Social Account
     Continue with GoogleLogin using your credentials
As we know that the Oozie is a workflow manager, using Oozie we create a job that can run multiple tools of Hadoop such as Scoop, Flume, Hive etc.
As part of this exercise, we are going to learn how a typical job looks like and how to run a job.
These are the examples provided in Oozie documentation. We will be running one of the examples of Map-Reduce.
If the Lab is available in the right side frame, please switch to the "Console" tab.
Please run the following Linux command to copy examples to the home directory in the web console: `cp /usr/hdp/current/oozie-client/doc/oozie-examples.tar.gz ./`
Now, once this is copies to your home directory, you will see `oozie-examples.tar.gz` in the results of `ls` command.
Please run the following command to extract the files: tar -zxvf oozie-examples.tar.gz
This will create a folder examples
in your home directory. Let us change the directory to that using cd examples
.
Now, let us try to walk through what is where. type ls
to know the contents of examples
folder. It would be having the following folders:
apps
This contains the various workflows such as Sqoop, Spark, Hive, Pig and MapReduce. And we will be executing these. A workflow typically has compiled java code, Oozie XML script and configuration files.
input-data
This contains the data that the various examples are using.
src
This contains the source code of the apps. It gets compiled and bundled into a single jar file oozie-examples-XXXXX.jar
and used in the apps.
./input-data
This contains the example data to be. Though there is no significant data, it serves as a good example.
To take a quick look at the last 10 lines of the file, please check the output of: tail input-data/text/data.txt
Now, let us take a look at the map-reduce job. Please change the directory to the map-reduce folder using cd apps/map-reduce/
and run ls
to see the various files. You will see the following list of files.
job-with-config-class.properties and job.properties
These two files are entry points. First one is one job and second is for another job. We will be using job.properties
for our hands-on. Basically, we will pass the location of this file to Oozie client while launching the job.
The location of workflow XML that contains the actual Oozie workflow is mentioned in this config file. This config file contains one key-value pair of settings or configuration in it. You can just take a look the contents of the file using: cat job.properties
lib/
This folder contains the jar file that contains the classes having map-reduce logic. Take a look at the list of files in lib using ls lib
. It should list a jar file oozie-examples-XXXXX.jar
workflow.xml and workflow-with-config-class.xml
These are the main workflow files. We will be using workflow.xml
Take a look at the contents of this file using cat workflow.xml
. It should look something like this:
```
<workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf">
<start to="mr-node"/>
<action name="mr-node">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>mapred.mapper.class</name>
<value>org.apache.oozie.example.SampleMapper</value>
</property>
....
</configuration>
</map-reduce>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
```
If you go through it you would understand that here we are defining the workflow - which commands to execute, in what order and what to do if it fails.
Edit examples/apps/map-reduce/job.properties
using nano
and set the values in the above file to the following:
nameNode=hdfs://10.142.1.1:8020
jobTracker=10.142.1.2:8050
queueName=default
examplesRoot=examples
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce/workflow.xml
outputDir=map-reduce
Here nameNode
refers to HDFS namenode which might be a different IP in your real-life project. jobTracker is basically resource manager's IP address. queueName and examplesRoot you can leave as such. Further, in a real-life project, you might have to use a different queueName as configured by your sysadmin.
hadoop fs -copyFromLocal ~/examples
cd ~
oozie job -oozie http://10.142.1.2:11000/oozie -config examples/apps/map-reduce/job.properties -run
oozie job -oozie http://10.142.1.2:11000/oozie -info job_id
Wait fot the job to succeed. It might take about 2 minutes.
If you go thru the workflow.xml, you will understand the following:
Mapper class is org.apache.oozie.example.SampleMapper
.
Check the code of the file using
cat ~/examples/src/org/apache/oozie/example/SampleMapper.java
It basically does nothing. It passes whatever it gets as key-value. By default, the mapper gets the byte-location of each line as key and the content of line as the value.
Reducer class is org.apache.oozie.example.SampleReducer
.
Check the code using: cat ~/examples/src/org/apache/oozie/example/SampleReducer.java
By looking at Mapper and Reducer, it is obvious that it basically does nothing. So, it might just print the location of each line in the file. But with this example, the main objective is that you should be able to schedule your own Map-reduce job using Oozie.
The input folder is /user/${wf:user()}/${examplesRoot}/input-data/text
examples/output-data/map-reduce
Let us take a look at HDFS to check if the folder and files were created:
hadoop fs -ls examples/output-data/map-reduce
. It should show something like this:
[sandeepgiri9034@cxln4 ~]$ hadoop fs -ls examples/output-data/map-reduce Found 2 items -rw-r--r-- 3 sandeepgiri9034 sandeepgiri9034 0 2020-11-06 11:27 examples/output-data/map-reduce/_SUCCESS -rw-r--r-- 3 sandeepgiri9034 sandeepgiri9034 1547 2020-11-06 11:27 examples/output-data/map-reduce/part-00000
The part-00000
contains the result. Let's check it out:
* hadoop fs -cat examples/output-data/map-reduce/part-00000
*
The output would look something like this:
0 To be or not to be, that is the question;
42 Whether 'tis nobler in the mind to suffer
84 The slings and arrows of outrageous fortune,
129 Or to take arms against a sea of troubles,
172 And by opposing, end them. To die, to sleep;
217 No more; and by a sleep to say we end
255 The heart-ache and the thousand natural shocks
302 That flesh is heir to ? 'tis a consummation
346 Devoutly to be wish'd. To die, to sleep;
387 To sleep, perchance to dream. Ay, there's the rub,
438 For in that sleep of death what dreams may come,
487 When we have shuffled off this mortal coil,
531 Must give us pause. There's the respect
571 That makes calamity of so long life,
608 For who would bear the whips and scorns of time,
It basically printed the byte-count at which each line is starting.
In order to list down all the jobs from all users, you can use the command:
oozie jobs -oozie http://10.142.1.2:11000/oozie
The result would look something like this:
Job ID App Name Status User Group Started Ended
------------------------------------------------------------------------------------------------------------------------------------
0000003-201211110857173-oozie-oozi-W map-reduce-wfSUCCEEDED praveen1058- 2020-12-12 06:33 GMT 2020-12-12 06:34 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000002-201211110857173-oozie-oozi-W map-reduce-wfSUCCEEDED sandeepgiri9034- 2020-12-12 06:30 GMT 2020-12-12 06:30 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000001-201211110857173-oozie-oozi-W map-reduce-wfSUCCEEDED sandeepgiri9034- 2020-12-12 06:28 GMT 2020-12-12 06:29 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000000-201211110857173-oozie-oozi-W map-reduce-wfSUCCEEDED manujjoshi528582- 2020-12-12 02:48 GMT 2020-12-12 02:49 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000125-201011182341534-oozie-oozi-W map-reduce-wfKILLED manujjoshi528582- 2020-12-11 11:04 GMT 2020-12-11 11:20 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000124-201011182341534-oozie-oozi-W map-reduce-wfKILLED sandeepgiri9034- 2020-12-11 11:02 GMT 2020-12-11 11:20 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000123-201011182341534-oozie-oozi-W map-reduce-wfSUCCEEDED sandeep - 2020-12-11 10:56 GMT 2020-12-11 11:10 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000122-201011182341534-oozie-oozi-W map-reduce-wfKILLED sandeepgiri9034- 2020-12-11 10:51 GMT 2020-12-11 11:20 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000121-201011182341534-oozie-oozi-W map-reduce-wfKILLED manujjoshi528582- 2020-12-11 10:25 GMT 2020-12-11 11:20 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000120-201011182341534-oozie-oozi-W map-reduce-wfKILLED manujjoshi528582- 2020-12-11 10:18 GMT 2020-12-11 11:20 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000119-201011182341534-oozie-oozi-W map-reduce-wfSUCCEEDED manujjoshi528582- 2020-12-11 10:14 GMT 2020-12-11 11:11 GMT
------------------------------------------------------------------------------------------------------------------------------------
In order to list down all the jobs from a user say 'sandeepgiri9034', you can use the command:
oozie jobs -oozie http://10.142.1.2:11000/oozie -filter user=sandeepgiri9034
If you want to kill all the jobs listed using the 'jobs' command, you can add '-kill' in the end.
oozie jobs -oozie http://10.142.1.2:11000/oozie -filter user=MYUSERNAME -kill
Note that this will kill all the jobs of the user 'MYUSERNAME'.
You can see the documentation of oozie here: Oozie Command line Documentation
Let’s run an Oozie job for MapReduce action. Login to CloudxLab Linux console. Copy Oozie examples to your home directory in the console. Extract files from tar. Edit examples/apps/map-reduce/job.properties and set the value of "namenode" and "jobtracker". We can find the Namenode host from Ambari under “HDFS” section.
We will be running examples/apps/map-reduce/workflow.xml in our job. Copy the examples directory to HDFS and run the job using the command displayed on the screen. cxln2.c.thelab-240901.internal:11000 is the host and port where the Oozie server is running.
Press enter. We will get the job id in the command prompt. To check the status of job type command displayed on the screen. Job-status is "Running".
Note: In step 4 you need to edit the file. This step is often missed.
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Loading comments...