Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left
Apply NowLogin using Social Account
     Continue with GoogleLogin using your credentials
In this step, we will create tweets_simple
and tweets_clean
views.
Question-
From which country tweet with id 330044004693598208
was tweeted?
Launch hive console by typing the hive
command in the web console.
hive
Run below commands in the hive. This gives the ability to handle JSON format. Please note that you will have to do 'add jar' in every session.
ADD JAR hdfs:///data/hive/json-serde-1.1.9.9-Hive13-jar-with-dependencies.jar;
SET hive.support.sql11.reserved.keywords=false;
Select your database after creating your database. Run below commands. ${env:USER}
gets replaced by your username automatically:
CREATE DATABASE IF NOT EXISTS ${env:USER};
USE ${env:USER};
Create view tweets_simple
. The tweets_simple
view contains tweet id, the timestamp of the
tweet, tweet text, and user's time zone. Run below command in the Hive on your web console. See the manual of from_unixtime, unix_timestamp, concat, substring.
CREATE VIEW tweets_simple AS
SELECT
id,
cast ( from_unixtime( unix_timestamp(concat( '2013 ', substring(created_at,5,15)), 'yyyy MMM dd hh:mm:ss')) as timestamp) ts,
text,
user.time_zone
FROM tweets_raw;
Here:
from_unixtime(bigint unixtime[, string format])
- Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the format of "1970-01-01 00:00:00".
unix_timestamp()
- Gets current Unix timestamp in seconds. This function is not deterministic and its value is not fixed for the scope of a query execution, therefore prevents proper optimization of queries - this has been deprecated since 2.0 in favour of CURRENT_TIMESTAMP constant.
concat(string|binary A, string|binary B...)
- Returns the string or bytes resulting from concatenating the strings or bytes passed in as parameters in order. For example, concat('foo', 'bar') results in 'foobar'. Note that this function can take any number of input strings.
substring(string|binary A, int start)
- Returns the substring or slice of the byte array of A starting from start position till the end of string A. For example, substr('foobar', 4) results in 'bar' (see [http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_substr]).
Sample rows of tweets_simple
views are:
Create view tweets_clean
. The tweets_clean
view maps the user's timezone to the country. Each
row of the tweets_clean
view contains tweet_id, the timestamp of the tweet, tweet text, and user's
country (which is derived from time zone). Run below command in the Hive on your web console.
CREATE VIEW tweets_clean AS
SELECT
id,
ts,
text,
m.country
FROM tweets_simple t LEFT OUTER JOIN time_zone_map m ON t.time_zone = m.time_zone;
Sample rows of tweets_simple
views are<br><br>
<img src="https://s3.amazonaws.com/cloudxlab/static/images/aha/hive-project/tweets_clean_view.png" class="img-responsive" />
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...