Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left
Apply NowLogin using Social Account
     Continue with GoogleLogin using your credentials
To quickly check if it working or not, let check out the code on our Linux terminal - CloudxLab Web Console.
If you see the lab on the right, you can switch to the console tab here in Jupyter.
Clone the repository if you have not done already using: git clone https://github.com/cloudxlab/bigdata
If you already have it cloned, you may want to update it using: git pull
Then change the folder to the place where our build.sbt is located.
cd bigdata/spark/projects/apache-log-parsing_sbt/
Now, run the sbt package command.
sbt package
It might take a while because it is going to download a lot of dependencies. Once done, it should look like this:
The highlighted line shows the path on which the jar has been created. Also, notice the "success" message. Please note the complete url of the jar file. Jar files are basically bundle of files which include the various artifact needed for running it in production.
Please run the following command:
spark-submit target/scala-2.10/apache-log-parsing_2.10-0.0.1.jar com.cloudxlab.logparsing.EntryPoint 10 /data/spark/project/access/access.log.2.gz
You will see the following printed on the screen after execution:
===== TOP 10 IP Addresses ===== (119.82.122.103,1495) (23.118.53.218,337) (115.111.218.74,324) (122.166.10.179,321) (76.187.25.94,281) (216.113.160.77,236) (66.249.64.193,231) (66.249.64.189,220) (1.23.173.22,211) (69.250.77.30,191)
You can see that it has successfully found top 10 IP Addresses which have accessed the website as per the apache logs.
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Answer is not availble for this assesment
Loading comments...