So far we have tried to establish that while handling humongous data we would need new set of tools which can operate in a distributed fashion.
But who would be generating such data or who would need to process such humongous data? Quick answer is everyone.
Now, let us try to take few examples.
In e-commerce industry, the recommendation is a great example of Big Data processing. Recommendations also known as collaborative filtering is the process of suggesting someone a product based on their preferences or behavior.
The e-commerce website would gather lot of data about the customer's behavior. In a very simplistic algorithm, we would basically try to find similar users and then cross-suggest them the products. So, more the users, better would be results.
As per Amazon, major chunk of their sales happen via recommendations on website and email. The other big example of Big Data processing was Netflix 1 million dollar competition to generate the movie recommendations.
As of today generating recommendations have become pretty simple. The engines such as MLLib or Mahout have made it very simple to generate recommendations on humongous data. All you have to do is format the data in the three column format: user id, movie id, and ratings.
Taking you to the next exercise in seconds...