Hive - Project

1 / 17
 

Hive - Project - Sentiment Analysis

Objective

The objective of the exercise is to do the sentiment analysis based on the tweets data downloaded from Twitter.

We'll do sentiment analysis of movie "Iron Man 3" using Hive and visualize the sentiment data using Tableau.

The dataset containing tweets of "Iron Man 3" movie is located at below location in HDFS

/data/SentimentFiles/SentimentFiles/upload/data

We'll calculate sentiment using a rudimentary technique. We've polarity of common words in below dictionary file in HDFS

/data/SentimentFiles/SentimentFiles/upload/data/dictionary/dictionary.tsv

Based on the polarity of words, we will calculate the sentiment of each tweet. You can choose exactly the same steps or use different strategy altogether to calculate the sentiment.

There are various deviations possible, for example:

  1. Use pig or spark instead of hive
  2. Use a completely different algorithm to compute the sentiment based on NLP
  3. Use your own Flume pipeline to download the data (~/sentiment/flume/) and start afresh with a different movie
  4. Create your own program to download data from Twitter
  5. Use some other mechanism of displaying the data such as D3.js or BIRT