Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left

  Apply Now

Hive

5 / 18

Hive - Managed Tables - Hands-on




Not able to play video? Try with youtube

Let's create a managed table. First copy nyse data from hdfs to your linux home directory. Type hadoop fs -copyToLocal /data/NYSE_daily, type ls NYSE_daily/* to check if the data is copied. Now type hive and wait for hive command-line interface to appear. Please use your database type use followed by your database name if your database name is same as your username you can say use followed by ${env:USER}; and press enter. The string ${env:USER} gets replaced by your username automatically. Create a table in the hive shell in the web console, in this command we are defining the schema of nyse table and we are informing hive that the fields are terminated by a tab which is '\t' while loading the data hive will know that the fields are terminated by tab. After typing this command press enter. Let us see the metadata of the table type describe nyse; and press enter we can see the column names and data types to see more low-level details type describe formatted nyse; and press enter we can see the details like database name, owner name, created time, warehouse location, and table type. Let us load the data into nyse table we can load data either from the local file system in linux console or from hdfs. Let's load data from the local file system we already copied nyse daily file from hdfs to your home directory in cloudxlab linux console run load data local inpath 'NYSE_daily' overwrite into table nyse; this command will load nyse_daily data into nyse table in hive the overwrite keyword deletes the content of the table if it is existing already and replaces it with the files specified in the file path. To check the loaded data type select * from nyse;. Please note that when we load the data from the local file system it is copied to the warehouse, let's verify it run hadoop fs -ls /apps/hive/warehouse/$USER.db/nyse/* in the web console. Let's load data from hdfs into the nyse_hdfs table run hadoop fs -cp /data/NYSE_daily /user/$USER/ to copy data from /data/NYSE_daily directory in hdfs into your home directory in hdfs let's load data from hdfs in a managed table nyse_hdfs. Create a managed table called nyse_hdfs now run the command load data inpath followed by the path in hdfs directory overwrite into table nyse_hdfs followed by semicolon. This command will load data from NYSE_daily from your home directory in hdfs to nysc_hdfs table please note that when we load the data from hdfs the file gets moved from hdfs to the warehouse directory in hdfs. Let's verify it run hadoop fs -ls followed by the warehouse directory path in the web console you can see that the file has been moved.

INSTRUCTIONS

Steps:

1. Create a table called nyse in hive and load data from your home folder in the web console

  • Login to the web console. Run below commands on the web console
  • Copy NYSE data from HDFS to your local

    hadoop fs -copyToLocal /data/NYSE_daily
    
  • Launch Hive with typing in hive on the console. Run the below command in Hive.

  • Use your database by using the below command. ${env:USER} gets replaced by your username automatically:

    use ${env:USER};
    
  • Create table nyse using below command

    CREATE TABLE nyse(
        exchange1 STRING,
        symbol1 STRING,
        ymd STRING,
        price_open FLOAT,
        price_high FLOAT,
        price_low FLOAT,
        price_close FLOAT,
        volume INT,
        price_adj_close FLOAT
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
    
  • See metadata of table using below command

    DESCRIBE nyse;
    
  • To see more low-level details, type below command

    DESCRIBE FORMATTED nyse;
    
  • Load data to your nyse table

    use ${env:USER};
    load data local inpath 'NYSE_daily' overwrite into table nyse;
    
  • Select rows from the table ( in HIVE console)

    select * from nyse;
    
  • Check the warehouse directory in HDFS by typing below command in unix console(press ctrl+c to exit hive console).

    hadoop fs -ls /apps/hive/warehouse/$USER.db/nyse/*
    

2. Create a table called nyse_hdfs in hive and load data from your home folder in HDFS

  • Copy NYSE_daily dataset from data directory in HDFS to your home directory in HDFS. $USER gets replaced by your username automatically. Run the below command in the web console.

    hadoop fs -cp /data/NYSE_daily  /user/$USER/
    
  • Loading data from HDFS. Run the below command in Hive.

    CREATE TABLE nyse_hdfs(
        exchange1 STRING,
        symbol1 STRING,
        ymd STRING,
        price_open FLOAT,
        price_high FLOAT,
        price_low FLOAT,
        price_close FLOAT,
        volume INT,
        price_adj_close FLOAT
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
    
  • Run the following command in the Hive.

    load data inpath 'hdfs:///user/${env:USER}/NYSE_daily' overwrite into table nyse_hdfs;
    
  • The above command moves the data from the specified location to the warehouse.

  • You can check data using the following command. Run the below command in the web console.

    hadoop fs -ls /apps/hive/warehouse/$USER.db/nyse_hdfs/*
    

Loading comments...