To quickly check if it working or not, let checkout the code on our linux terminal - CloudxLab Web Console.
Login to Web Console
git clone <url of repository>
Then change the folder to the place where our build.sbt is located.
Now, run the sbt package command. It might take a while because it is going to download a lot of dependencies. Once done, it should look like this:
The highlighted line shows the path on which the jar has been created. Also, notice the "success" message. Please note the complete url of the jar file. Jar files are basically bundle of files which include the various artifact needed for running it in production.
Please run the following command:
spark-submit target/scala-2.10/apache-log-parsing_2.10-0.0.1.jar com.cloudxlab.logparsing.EntryPoint 10 /data/spark/project/access/access.log.45.gz
You will see the following printed on the screen after execution:
===== TOP 10 IP Addresses ===== (188.8.131.52,1495) (184.108.40.206,337) (220.127.116.11,324) (18.104.22.168,321) (22.214.171.124,281) (126.96.36.199,236) (188.8.131.52,231) (184.108.40.206,220) (220.127.116.11,211) (18.104.22.168,191)
You can see that it has successfully found top 10 IP Addresses which have accessed the website as per the apache logs.
Taking you to the next exercise in seconds...