Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left

  Apply Now

Hive

12 / 18

Hive - Load JSON Data




Not able to play video? Try with youtube

To load json data in hive we use json-serde, serde represents serializer deserializer. When we load json data in hive json-serde converts the json data into the tabular format. json-serde jar is located at /data/serde directory in hdfs. Login into cloudxlab linux console copy /data/sample_json from hdfs to your home folder in hdfs run command hadoop fs -cp /data/sample_json followed by /user/$USER, \$USER is a variable which will be substituted with your username. Now to check the contents of the sample_json directory. Run hadoop fs -ls /user/$USER/sample_json it contains the file user_country.json. Check the last few lines of the file please note that each row in the file contains json object run command hadoop fs -tail /user/$USER/sample_json /user_country.json. Now launch hive by typing hive and wait for hive prompt to appear. When we load json data into hive, json-serde converts the json data into the tabular format. json-serde jar is located at /data/serde directory in hdfs, before creating the table add the serde jar in hive with add jar followed by jar path command now you can create the table using the syntax displayed on the screen please note the row format unlike previous examples fields are not terminated by tab or comma. To select the rows from user_countries table you can now simply run select * from user_countries followed by semicolon now this command will query the json object and print the result using a simple sql query the translation from json into this tabular form is being done by serde the jar that we have added.

INSTRUCTIONS

Steps:

  • Load JSON file from your home folder in HDFS to Hive
  • Login to the web console. Run the below commands in the web console.
  • Copy /data/sample_json from HDFS to your home folder in HDFS. $USER gets replaced by your username automatically:

    hadoop fs -cp /data/sample_json /user/$USER
    
  • Check the content of the sample_json directory.

    hadoop fs -ls /user/$USER/sample_json
    
  • It contains the file user_country.json. Check the last few lines of the file. Please note that each row in the file contains JSON.

    hadoop fs -tail /user/$USER/sample_json/user_country.json
    
  • Launch Hive by typing hive in the web console. Run the below commands in the hive.

  • Add JSON-SERDE JAR using below command:

    ADD JAR hdfs:///data/serde/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar;
    
  • Use your database. ${env:USER} gets replaced by your username automatically:

    use ${env:USER};
    
  • To create the table use the following command.

    CREATE EXTERNAL TABLE user_countries(
        name string,
        country string
    )
    ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
    LOCATION 'hdfs:///user/${env:USER}/sample_json';
    
  • Select rows from user_countries table

    select * from user_countries;
    

Loading comments...