MovieLens data sets were collected by the GroupLens Research Project
at the University of Minnesota.
This data set consists of
Movielens dataset is located at /data/ml-100k in HDFS. Read README.md file to understand the dataset.
We will load u.data file in Hive managed table. u.data contains dataset where each row represents userid, movieid, rating and timestamp fields. Fields are terminated by "\t"
1.Create a managed table u_data in your database in Hive. Run the below commands. Replace your-username and your-database-name with your CloudxLab username
-- Create database with your CloudxLab username
CREATE DATABASE If NOT EXISTS your-username;
-- Select your database
-- Create table
CREATE TABLE IF NOT EXISTS u_data( userid INT, movieid INT, rating INT, unixtime TIMESTAMP)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
2.Now load the data in u_data table. Run below commands. Replace your-username with your CloudxLab username
# Copy the data from /data directory in HDFS to your home directory in HDFS. Run below command in Linux console
hadoop fs -cp /data/ml-100k/u.data /user/your-username/
# Login to Hue, launch Hive and load the data from your home directory in HDFS. Run below command in Hive query editor in Hue
LOAD DATA INPATH 'hdfs:///user/your-username/u.data' overwrite into table u_data;
3.Check if data is loaded. Go to warehouse directory at /apps/hive/warehouse in Hue file browser. Select your database name and go inside it. You will see u_data directory. Go inside it and see if data exists.
Hive - Introduction
Hive - Data Types
Hive - Getting Started - Hands-on
Hive - Tables
Hive - Managed Tables - Hands-on
Hive - External Tables - Hands on
Hive - Select and Aggregation Queries
Hive - Saving Data
Hive - DDL - Alter Table
Hive - Partitions
Hive - Views
Hive - Load JSON Data
Hive - Sorting & Bucketing
Hive - ORC File Format
Hive - Quick Recap
Connect to Apache Hive using Tableau
Hive - MovieLens Assignment
Hive - Resources