Hive - ORC File Format






Important Note - We recommend you to execute the given commands on Hive Console instead of Hue. The video is only for representational purposes.

ORC - Optimized Row Columnar file format, provides a highly efficient way to store Hive data. Using ORC files improves performance when Hive is reading, writing, and processing data. ORC has a built-in index, min/max values, and other aggregations. It is proven in large scale deployments. Facebook uses the ORC file format for a 300+ PB deployment

To use ORC file format, specify the STORED AS ORC clause while creating the table. Create the table with the command displayed on the screen. Now insert some data and retrieve the data using the commands displayed on the screen.

INSTRUCTIONS

Steps:

  • Create ORC table
  • Login to the web console
  • Launch Hive by typing hive in the web console. Run the below commands in Hive.
  • Use your database by using the below command. ${env:USER} gets replaced by your username automatically:

    use ${env:USER};
    
  • To create an ORC file format:

    CREATE TABLE orc_table (
        first_name STRING, 
        last_name STRING
     ) 
     STORED AS ORC;
    
  • To insert values in the table:

    INSERT INTO orc_table VALUES ('John','Gill');
    
  • To retrieve all the values in the table:

    SELECT * FROM orc_table;
    

Loading comments...