MapReduce Programming

3 / 13

Writing MapReduce code using Eclipse




Not able to play video? Try with vimeo

The URL of Github repository is http://github.com/cloudxlab/bigdata

INSTRUCTIONS

Eclipse is integrated development environment - IDE, especially for Java. Basically, it is a good text editor having a great set of features like a compiler, debugger, syntax auto complete.

Eclipse is very often used for Java development.

Now, let try to build our map-reduce code with eclipse.

The first step is to download and install eclipse. Open eclipse.org, click on Download on the top right. Download the one suggested for you.

Wait for it to complete. Once downloaded please double click to extract and then open the Eclipse Installer binary. In mac the name of the program has suffix .app and in windows it's extension is .exe.

With the installer, install Eclipse IDE for Java developers. Wait for it to complete and then launch.

The eclipse prompts for selecting the workspace. Your work gets saved in a workspace. Close the welcome window.

Now, let's download the code from GitHub repository. Please open the repository URL displayed on the screen

Click on "Clone or download" and "Download Zip".

Unzip the downloaded file. It would have the folders containing the code.

In Eclipse, create a javaproject. Give it a name. Uncheck default location and browse to the hdpexamples java folder.

Click on finish.

This would create a project. Right on it and click on "Properties"

Select Java Build path, see the lobraries tab. It has automatically added libraries from lib folder.

Select "java Compiler", change it to 1.7 instead of 1.8.

Click on ok.

You can see it has discovered all the classes. Lets take a look at driver the entry point for our example.

Now, right click on the project folder, click on export. Then select "Jar file".

Next select the destination. we are going to keep it in the downloads folder.

For now, lets ignore the warnings.

The jar file has been created.

Now we will upload the jar file to our home directory on the web console. Go to Mylab, open Jupyter. Now click on the upload button. Select hdpexample.jar file from downloads folder. Click on upload to begin uploading the jar file. Once uploading is finished open the web console and type ls to view the files and folders in your home directory. There you can see the jar file.

Now, let us execute the mapreduce for wordcount using:hadoop jar hdpexample_eclipse.jar com.cloudxlab.wordcount.StubDriver. Copy the application id and open another web console simultaneously. Type yarn application -status applicationid to check the status of the job. As you can see it shows the progress and status of the currently executed application.

To view the output of the job we executed previously. Type hadoop fs -ls /user/ followed by your user name and then javamrout. It will show the list of files in the output folder. Now you can view the content of part-r file using the hadoop fs -cat command . Here it shows the words and next to each word is its word count.

Browse through the code. You will see the first example called chaining, it shows how to chain multiple jobs together. You can execute this driver from existing jar.

The second package is charcount, which can compute the character frequencies in a huge data.

The third package is customreader, which provides an example of how to create your custom input format.

The forth package is an example of hive user defined functions.

The next package is "nextword". The nextword, provides solution to finding next word recommendations based on huge data.

Simplewordcount is just another wordcount. It can be ignored.

Now you can make your modifications to the code, export the jar again, upload to hue, copy to local and then run mapreduce job using hadoop jar followed by classname.


Loading comments...