Hive - Project

1 / 14
Hive - Project - Sentiment Analysis

Objective

The objective of the exercise is to do the sentiment analysis based on the tweets data downloaded from the Twitter.

We'll do sentiment analysis of movie "Iron Man 3" using Hive and visualize the sentiment data using Tableau.

The dataset containing tweets of "Iron Man 3" movie is located at below location in HDFS

/data/SentimentFiles/SentimentFiles/upload/data

Steps

  1. Create Hive tables for calculating and storing sentiment of each tweet. Corresponding hive.sql file is located at below location in HDFS

    /data/sentiment_analysis_project/hiveddl.sql
    
  2. Connect to Hive using Tableau to visualize the sentiments of various countries using Tableau.

We'll calculate sentiment using a rudimentary technique. We've polarity of common words in below dictionary file in HDFS

/data/SentimentFiles/SentimentFiles/upload/data/dictionary/dictionary.tsv

Based on the polarity of words, we will calculate the sentiment of each tweet. You can choose exactly same steps or use different strategy altogether to calculate the sentiment.

There are various deviations possible, for example:

  1. Use pig or spark instead of hive
  2. Use a completely different algorithm to compute the sentiment based on NLP
  3. Use your own Flume pipeline to download the data (~/sentiment/flume/) and start afresh with a different movie
  4. Create your own program to download data from twitter
  5. Use some other mechanism of displaying the data such as D3.js or BIRT

Lab Details


Enroll now to learn and practice or Refer friends and get 15 days lab access


Enroll Now >>