In this project, we will parse Apache logs to get some meaningful insights from the logs.
We've already done a part of it in Writing Spark Applications topic.
Extend the same project, write unit test cases and code for the next set of problems and send the code to firstname.lastname@example.org
Data set -
Dataset is located in /data/spark/project/NASA_access_log_Aug95.gz directory in HDFS
Above dataset is access log of NASA Kennedy Space Center WWW server in Florida.
The logs are an ASCII file with one line per request, with the following columns:
Note that from 01/Aug/1995:14:52:01 until 03/Aug/1995:04:36:13 there are no accesses recorded, as the Web server was shut down, due to Hurricane Erin.
Based on the above data, please answer following questions