Welcome to this project on Churning the Emails Inbox with Python. In this project, you will use Python to access the data from files and process it to achieve certain tasks. You will explore the MBox email dataset, and use Python to count lines, headers, subject lines by emails and domains. Know your way on how to work with data in Python.
Skills you will develop:
Whenever you make a request to a web server for a page, it records it in a file which is called logs.
The logs of a webserver are the gold mines for gaining insights in the user behaviour. Every data scientists usually look at the logs first to understand the behaviour of the users. But since the logs are humongous in size, it takes a distributed framework like Hadoop or Spark to process it.
As part of this project, you will learn to parse the text data stored in logs of a web server using the Apache Spark.