Hive - External Tables - Hands on




Not able to play video? Try with youtube

Important Note - We recommend you to execute the given commands on Hive Console instead of Hue. The video is only for representational purposes.

Let us create an external table in hive. First copy the data from /data/NYSE_daily to your home directory in HDFS. The syntax for creating external table is same as managed tables, just that we add an "external" keyword the and LOCATION keyword to define the path of underlying data. Copy the command as displayed on the screen and paste it in Hive query editor. Please note that for external tables, the warehouse will be located at the location provided in LOCATION keyword.

Click on execute and wait for the table to get created. To see the low-level metadata, type describe formatted nyse_external and click on execute. As you can see, the warehouse is located at the NYSE_daily directory in your home directory in HDFS. Also, the table type is EXTERNAL.

When we drop an external table, underlying data does not get deleted. Let's verify it. Type drop table nyse_external to drop the table and click on execute. Now go to NYSE_daily directory in your home directory in HDFS and you can see that the file is still there.

We can also create AWS S3 based external tables in the hive. Please note that we need to provide AWS Access Key ID and Secret Access Key to create S3 based external table

External tables describe the metadata on the external files. External table files can be accessed and managed via processes outside the Hive.

Use Case- External tables are used when the files are present in remote locations. The files remain intact even if we drop the table.

INSTRUCTIONS

Steps:

  • Login to the web console

  • Copy NYSE data from HDFS to your local

    hadoop fs -copyToLocal /data/NYSE_daily
    
  • Launch Hive with typing in hive on console

  • Use your own database by using the below command. ${env:USER} gets replaced by your username automatically:

    use ${env:USER};
    
  • Create an external table nyse_external using below command

    CREATE EXTERNAL TABLE nyse_external(
        exchange1 STRING,
        symbol1 STRING,
        ymd STRING,
        price_open FLOAT,
        price_high FLOAT,
        price_low FLOAT,
        price_close FLOAT,
        volume INT,
        price_adj_close FLOAT
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
    LOCATION '/user/${env:USER}/NYSE_daily';
    
  • To see more low-level details, type below command

    DESCRIBE FORMATTED nyse_external;
    
  • To drop the external table type

    DROP TABLE nyse_external;
    

Loading comments...