Hive - Managed Tables - Hands-on




Not able to play video? Try with youtube

Important Note - We recommend you to execute the given commands on Hive Console instead of Hue. The video is only for representational purposes.

Let's create a managed table and load data into it. We are going to load NYSE - NewYork Stock Exchange data into managed table in Hive. Let's look at the data. Login to Hue, Go to "File Browser" and navigate to /data/NYSE_daily/NYSE_daily

The first column is the name of exchange, second column is stock symbol, third column is date at which we are recording the prices, fourth column is opening price of a stock, the fifth column is the highest price, the sixth column is the lowest price, seventh column is closing price, eight column is volume and ninth column is adjusted closing price of the stock and these values are separated by tab.

Let's create the managed table. Copy the create table command displayed on the screen and paste it in the Hive query editor in Hue. Make sure that you've selected your database. In this command, we are defining the schema of the nyse table and we are informing hive that fields are terminated by a tab which is \t. While loading the data, Hive will know that fields are terminated by "tab"

Click on "execute". Refresh the table list and we can see that nyse table is created in our database. Let us see the metadata of the table. Type describe nyse and click on "execute". We can see the column names and datatypes. To see more low-level details, type describe formatted nyse and click on "execute". We can see the details like database name, owner name, created time, warehouse location and table type.

Let's see the table details in metastore. Click on "Data browsers", select "metastore tables". Select your database and then click on "nyse" table. We can see the metadata in "columns" and "properties" tab. Let's see the warehouse directory for "nyse" table. Go to "file browser" and navigate to /apps/hive/warehouse/ and select the directory with your database name. We can see "nyse" directory inside your database directory. Let's go inside it. Since we do not have any data in the table yet, we cannot see any files here.

Let's load the data into "nyse" table. We can load data either from the local file system in Linux console or from HDFS. In most production scenarios, we will load data from HDFS as local file system can not store big data. Please note that we will use hive command line to load data from local file system as Hue will not be aware of the local file system.

Let's load data from local file system. Copy the NYSE_daily file from HDFS to your home directory in CloudxLab Linux console. Type hadoop fs -copyToLocal /data/NYSE_daily. Type ls NYSE_daily/* to check if data is copied. Launch hive shell by typing a command called hive. Select your database and run load data local inpath 'NYSE_daily' overwrite into table nyse;. This command will load NYSE_daily data into nyse table in Hive. The OVERWRITE keyword deletes the content of the table (or partition) if it is existing already and replaces it with the files specified in the file path.

Press enter. Please note that when we load the data from the local filesystem, it gets copied to the warehouse. Let's verify it. Go to "File browser" in Hue and navigate to /apps/hive/warehouse. Select your database directory and click on "nyse". We can see the NYSE_daily file in the warehouse. To check the loaded data, type select * from nyse; in the hive interface.

Let's load data from HDFS in a managed table. Copy the command displayed on the screen, paste it in Hive query editor, and click on execute. It will create a managed table called nyse_hdfs. Let's load data from HDFS in the nyse_hdfs table. Copy data from /data/NYSE_daily directory in HDFS to your home directory in HDFS. Now run the command load data inpath 'hdfs:///user/your-user-name/NYSE_daily' overwrite into table nyse_hdfs;. This command will load data from NYSE_daily file from your home directory in HDFS to nyse_hdfs table. Please note that when we load the data from HDFS, it gets moved from HDFS to the warehouse. Let's verify it. Go to "file browser" and navigate to NYSE_daily file in your home directory. NYSE_daily file is not there anymore. Let's see if it is there in the warehouse. Navigate to /apps/hive/warehouse. Select your database directory, go inside nyse_hdfs directory and click on nyse_daily. We can see the file in the warehouse.

INSTRUCTIONS

Steps:

1. Create a table called nyse in hive and load data from your home folder in the web console

  • Login to the web console. Run below commands on the web console
  • Copy NYSE data from HDFS to your local

    hadoop fs -copyToLocal /data/NYSE_daily
    
  • Launch Hive with typing in hive on the console. Run the below command in Hive.

  • Use your database by using the below command. ${env:USER} gets replaced by your username automatically:

    use ${env:USER};
    
  • Create table nyse using below command

    CREATE TABLE nyse(
        exchange1 STRING,
        symbol1 STRING,
        ymd STRING,
        price_open FLOAT,
        price_high FLOAT,
        price_low FLOAT,
        price_close FLOAT,
        volume INT,
        price_adj_close FLOAT
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
    
  • See metadata of table using below command

    DESCRIBE nyse;
    
  • To see more low-level details, type below command

    DESCRIBE FORMATTED nyse;
    
  • Load data to your nyse table

    use ${env:USER};
    load data local inpath 'NYSE_daily' overwrite into table nyse;
    
  • Check the warehouse directory in Hue (in Hue File Store at /apps/hive/warehouse)

  • Select rows from the table ( in HIVE console)

    select * from nyse;
    

2. Create a table called nyse_hdfs in hive and load data from your home folder in HDFS

  • Copy NYSE_daily dataset from data directory in HDFS to your home directory in HDFS. $USER gets replaced by your username automatically. Run the below command in the web console.

    hadoop fs -cp /data/NYSE_daily  /user/$USER/
    
  • Loading data from HDFS. Run the below command in Hive.

    CREATE TABLE nyse_hdfs(
        exchange1 STRING,
        symbol1 STRING,
        ymd STRING,
        price_open FLOAT,
        price_high FLOAT,
        price_low FLOAT,
        price_close FLOAT,
        volume INT,
        price_adj_close FLOAT
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
    
  • Run the following command in the Hive.

    load data inpath 'hdfs:///user/$USER/NYSE_daily' overwrite into table nyse_hdfs;
    
  • The above command moves the data from the specified location to the warehouse.

  • You can check data using the following command. Run the below command in the web console.

    hadoop fs -ls /apps/hive/warehouse/$USER.db/nyse_hdfs/*
    

Loading comments...