#NoPayJan Offer - Access all CloudxLab Courses for free between 1st to 31st JanEnroll Now >>
To quickly check if it working or not, let check out the code on our Linux terminal - CloudxLab Web Console.
If you see the lab on the right, you can switch to the console tab here in Jupyter.
Clone the repository if you have not done already using:
git clone https://github.com/cloudxlab/bigdata
If you already have it cloned, you may want to update it using:
Then change the folder to the place where our build.sbt is located.
Now, run the sbt package command.
It might take a while because it is going to download a lot of dependencies. Once done, it should look like this:
The highlighted line shows the path on which the jar has been created. Also, notice the "success" message. Please note the complete url of the jar file. Jar files are basically bundle of files which include the various artifact needed for running it in production.
Please run the following command:
spark-submit target/scala-2.10/apache-log-parsing_2.10-0.0.1.jar com.cloudxlab.logparsing.EntryPoint 10 /data/spark/project/access/access.log.45.gz
You will see the following printed on the screen after execution:
===== TOP 10 IP Addresses ===== (22.214.171.124,1495) (126.96.36.199,337) (188.8.131.52,324) (184.108.40.206,321) (220.127.116.11,281) (18.104.22.168,236) (22.214.171.124,231) (126.96.36.199,220) (188.8.131.52,211) (184.108.40.206,191)
You can see that it has successfully found top 10 IP Addresses which have accessed the website as per the apache logs.
No hints are availble for this assesment
Answer is not availble for this assesment