Hive

8 / 19

Hive - Select and Aggregation Queries




Not able to play video? Try with youtube

Type hive and wait for hive prompt to appear. This is basically command-line interface or CLI. Please use your own database and run use space followed by your database name, in this command the string ${env:USER} is replaced by your username directly if your username and the database name are same you can use this variable instead. To see the values of all the columns of the table we can use select * from table name so type select * from nyse; to see all columns of nyse table to see only exchange1 and symbol1 columns type select exchange1, symbol1 from nyse followed by semicolon;. We can write aggregation queries in hive in the same way we write in SQL. To find the average opening price of each stock in nyse table type select symbol1, AVG(price_open) as avg_price from nyse group by symbol1;, we can see the average opening price of each stock

INSTRUCTIONS

Steps:

  • Login to the web console.
  • Launch Hive by running command hive. Run the below commands in Hive.
  • Use your database. Run the following command. ${env:USER} gets replaced by your username automatically:

    use ${env:USER};
    
  • To select all columns from the table nyse

    SELECT * FROM nyse;
    
  • To select only required columns from the table nyse

    SELECT exchange1, symbol1 FROM nyse;
    
  • Find the average opening price for each stock

    SELECT symbol1, AVG(price_open) AS avg_price FROM nyse
    GROUP BY symbol1;
    

Please login to comment

4 Comments

I am still unable to run aggregations. :(

  Upvote    Share

Hi Renan,

It seems that the issue is due to your disk space quota being exceeded. You can refer to https://discuss.cloudxlab.com/t/my-user-disk-space-quota-in-the-lab-has-exceeded-how-can-i-clean-the-unnecessary-files/5370. This guide will help you clear up unnecessary files and free up space.

  Upvote    Share

Getting below error.

hive> select symbol1, AVG(price_open)as avg_price FROM nyse GROUP BY symbol1;
Query ID = pranavsri3896_20230815115714_c4ab593d-1a79-4cbe-b184-2ad8d160f868
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1648130833540_32859, Tracking URL = http://cxln2.c.thelab-240901.internal:8088/proxy/application_1648130833540_32859/
Kill Command = /usr/hdp/2.6.2.0-205/hadoop/bin/hadoop job -kill job_1648130833540_32859
Interrupting... Be patient, this might take some time.
Press Ctrl+C again to kill JVM
killing job with: job_1648130833540_32859
Hadoop job information for Stage-1: number of mappers: 0;number of reducers: 0
2023-08-15 11:57:18,879 Stage-1 map = 0%,  reduce = 0%
Ended Job = job_1648130833540_32859 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

  Upvote    Share

Hi,

It is working fine from my end. Can you try it again?

  Upvote    Share