So far we have tried to establish that while handling humongous data we would need new set of tools which can operate in a distributed fashion.
But who would be generating such data or who would need to process such humongous data? Quick answer is everyone.
Now, let us try to take few examples.
In e-commerce industry, the recommendation is a great example of Big Data processing. Recommendations also known as collaborative filtering is the process of suggesting someone a product based on their preferences or behavior.
The e-commerce website would gather lot of data about the customer's behavior. In a very simplistic algorithm, we would basically try to find similar users and then cross-suggest them the products. So, more the users, better would be results.
As per Amazon, major chunk of their sales happen via recommendations on website and email. The other big example of Big Data processing was Netflix 1 million dollar competition to generate the movie recommendations.
As of today generating recommendations have become pretty simple. The engines such as MLLib or Mahout have made it very simple to generate recommendations on humongous data. All you have to do is format the data in the three column format: user id, movie id, and ratings.
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Please login to comment
4 Comments
is it the case that MLLIb or Mahout just requires a file containing User ID, Product ID, and ratings for recommendations?
What would happen if the unstructured data does not have these features? How would one get the recommendation in the absence of such features. Do we need to generate these features, (How to do that?)
Upvote ShareHi,
Yes, MLLIb or Mahout requires a file containing User ID, Product ID, and ratings for recommendations as they use Alternating Least Squares matrix factorization for collaborative filtering. You can check the parameters at https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.mllib.recommendation.ALS.html
What features does unstructured data have if not these?
Upvote ShareHow is Recommendations in big data processing different than ML recommendations? I didn't get that part. I thought it is Machine Learning Algorithms that recommends products to customers.
Upvote ShareYeah, you are right. They are exactly ML recommendations. But scaling those algorithms so that they can be used with a large amount of data or big data is what we call big data preprocessing.
1 Upvote Share