Hive - Project

3 / 14
   

Objective of this step is to create an external table which contains tweets of Iron Man 3 movie

Steps-

  1. Open Hive query editor in Hue and run below commands

    ADD JAR hdfs:///data/hive/json-serde-1.1.9.9-Hive13-jar-with-dependencies.jar;
    SET hive.support.sql11.reserved.keywords=false;
    
  2. Select your database. Replace YOUR_USER_NAME with your CloudxLab username. Run below commands

    CREATE DATABASE IF NOT EXISTS YOUR_USER_NAME;
    USE YOUR_USER_NAME;
    
  3. Create tweets_raw external table. It contains details of each tweet. Replace YOUR_USER_NAME with your CloudxLab username

    CREATE EXTERNAL TABLE tweets_raw (
        id BIGINT,
        created_at STRING,
        source STRING,
        favorited BOOLEAN,
        retweet_count INT,
        retweeted_status STRUCT<
        text:STRING,
        users:STRUCT<screen_name:STRING,name:STRING>>,
        entities STRUCT<
        urls:ARRAY<STRUCT<expanded_url:STRING>>,
        user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
        hashtags:ARRAY<STRUCT<text:STRING>>>,
        text STRING,
        user STRUCT<
        screen_name:STRING,
        name:STRING,
        friends_count:INT,
        followers_count:INT,
        statuses_count:INT,
        verified:BOOLEAN,
        utc_offset:STRING, -- was INT but nulls are strings
        time_zone:STRING>,
        in_reply_to_screen_name STRING,
        year int,
        month int,
        day int,
        hour int
    )
    ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
    WITH SERDEPROPERTIES ("ignore.malformed.json" = "true")
    LOCATION '/user/YOUR_USER_NAME/SentimentFiles/SentimentFiles/upload/data/tweets_raw';
    

Question-

How many records are in tweets_raw table?

Hint-

Run below query in Hive query editor

SELECT count(id) FROM tweets_raw;