Writing Spark Applications

13 / 16
Tutorial - Build In The Console (aka Shell, Terminal)

To quickly check if it working or not, let checkout the code on our linux terminal - CloudxLab Web Console.

1. Checkout your repository

Login to Web Console

github-url-copy

And clone: git clone <url of repository>

git-clone-finished

2. sbt package

Then change the folder to the place where our build.sbt is located.

cd bigdata/spark/projects/apache-log-parsing_sbt/

Now, run the sbt package command. It might take a while because it is going to download a lot of dependencies. Once done, it should look like this:

sbt-pkg-success

The highlighted line shows the path on which the jar has been created. Also, notice the "success" message. Please note the complete url of the jar file. Jar files are basically bundle of files which include the various artifact needed for running it in production.

3. spark-submit

Please run the following command:

spark-submit target/scala-2.10/apache-log-parsing_2.10-0.0.1.jar com.cloudxlab.logparsing.EntryPoint 10 /data/spark/project/access/access.log.45.gz

You will see the following printed on the screen after execution:

===== TOP 10 IP Addresses =====
(119.82.122.103,1495)
(23.118.53.218,337)
(115.111.218.74,324)
(122.166.10.179,321)
(76.187.25.94,281)
(216.113.160.77,236)
(66.249.64.193,231)
(66.249.64.189,220)
(1.23.173.22,211)
(69.250.77.30,191)

You can see that it has successfully found top 10 IP Addresses which have accessed the website as per the apache logs.