Hive - Project

Hive - Project - Sentiment Analysis


The objective of the exercise is to do the sentiment analysis based on the tweets data downloaded from the Twitter.

We'll do sentiment analysis of movie "Iron Man 3" using Hive and visualize the sentiment data using Tableau.

The dataset containing tweets of "Iron Man 3" movie is located at below location in HDFS



  1. Create Hive tables for calculating and storing sentiment of each tweet. Corresponding hive.sql file is located at below location in HDFS

  2. Connect to Hive using Tableau to visualize the sentiments of various countries using Tableau.

We'll calculate sentiment using a rudimentary technique. We've polarity of common words in below dictionary file in HDFS


Based on the polarity of words, we will calculate the sentiment of each tweet. You can choose exactly same steps or use different strategy altogether to calculate the sentiment.

There are various deviations possible, for example:

  1. Use pig or spark instead of hive
  2. Use a completely different algorithm to compute the sentiment based on NLP
  3. Use your own Flume pipeline to download the data (~/sentiment/flume/) and start afresh with a different movie
  4. Create your own program to download data from twitter
  5. Use some other mechanism of displaying the data such as D3.js or BIRT