Tutorial - Build In The Console (aka Shell, Terminal)

To quickly check if it working or not, let check out the code on our Linux terminal - CloudxLab Web Console.

1. Log in to Web Console

If you see the lab on the right, you can switch to the console tab here in Jupyter.

2. Checkout your repository

Clone the repository if you have not done already using: git clone https://github.com/cloudxlab/bigdata If you already have it cloned, you may want to update it using: git pull

CloudxLab Repository Clone Finished

3. sbt package

Then change the folder to the place where our build.sbt is located.

cd bigdata/spark/projects/apache-log-parsing_sbt/

Now, run the sbt package command.

sbt package

It might take a while because it is going to download a lot of dependencies. Once done, it should look like this:

sbt-pkg-success

The highlighted line shows the path on which the jar has been created. Also, notice the "success" message. Please note the complete url of the jar file. Jar files are basically bundle of files which include the various artifact needed for running it in production.

4. spark-submit

Please run the following command:

spark-submit target/scala-2.10/apache-log-parsing_2.10-0.0.1.jar com.cloudxlab.logparsing.EntryPoint 10 /data/spark/project/access/access.log.2.gz

You will see the following printed on the screen after execution:

===== TOP 10 IP Addresses =====
(119.82.122.103,1495)
(23.118.53.218,337)
(115.111.218.74,324)
(122.166.10.179,321)
(76.187.25.94,281)
(216.113.160.77,236)
(66.249.64.193,231)
(66.249.64.189,220)
(1.23.173.22,211)
(69.250.77.30,191)

You can see that it has successfully found top 10 IP Addresses which have accessed the website as per the apache logs.

Writing Spark Applications

Tutorial - Build In The Console (aka Shell, Terminal)

1. Log in to Web Console

2. Checkout your repository

3. sbt package

4. spark-submit

XP

Loading comments...