Hive - Project - Sentiment Analysis - Step - 2 | Automated hands-on| CloudxLab

The objective of this step is to create an external table which contains tweets of Iron Man 3 movie and answer the following question:

Question-

How many records are in the tweets_raw table?

89843
91786
87384

INSTRUCTIONS

Make sure you have copied the Iron Man 3 movie tweets in your HDFS home directory in previous slide, if not please copy the data using.
```
hadoop fs -cp /data/SentimentFiles /user/$USER
```
Launch hive console by typing the hive command in the web console.
```
hive
```

Run below commands in the hive:

ADD JAR hdfs:///data/hive/json-serde-1.1.9.9-Hive13-jar-with-dependencies.jar;
SET hive.support.sql11.reserved.keywords=false;

Select your database. Run below commands. ${env:USER} gets replaced by your username automatically:
```
CREATE DATABASE IF NOT EXISTS ${env:USER};
USE ${env:USER};
```

Create a tweets_raw external table. It contains details of each tweet. Run below commands in the hive. ${env:USER} gets replaced by your username automatically:

CREATE EXTERNAL TABLE tweets_raw (
    id BIGINT,
    created_at STRING,
    source STRING,
    favorited BOOLEAN,
    retweet_count INT,
    retweeted_status STRUCT<
    text:STRING,
    users:STRUCT<screen_name:STRING,name:STRING>>,
    entities STRUCT<
    urls:ARRAY<STRUCT<expanded_url:STRING>>,
    user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
    hashtags:ARRAY<STRUCT<text:STRING>>>,
    text STRING,
    user STRUCT<
    screen_name:STRING,
    name:STRING,
    friends_count:INT,
    followers_count:INT,
    statuses_count:INT,
    verified:BOOLEAN,
    utc_offset:STRING, -- was INT but nulls are strings
    time_zone:STRING>,
    in_reply_to_screen_name STRING,
    year int,
    month int,
    day int,
    hour int
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ("ignore.malformed.json" = "true")
LOCATION '/user/${env:USER}/SentimentFiles/SentimentFiles/upload/data/tweets_raw';

Run below query on hive console to answer the question:
```
SELECT count(id) FROM tweets_raw;
```

Get Hint See Answer

Note - Having trouble with the assessment engine? Follow the steps listed here

XP

Congratulations on completing topic -

Loading comments...