Type hive and wait for hive prompt to appear. This is basically command-line interface or CLI. Please use your own database and run use space followed by your database name, in this command the string ${env:USER} is replaced by your username directly if your username and the database name are same you can use this variable instead. To see the values of all the columns of the table we can use select * from table name so type select * from nyse;
to see all columns of nyse table to see only exchange1 and symbol1 columns type select exchange1, symbol1 from nyse
followed by semicolon;
.
We can write aggregation queries in hive in the same way we write in SQL. To find the average opening price of each stock in nyse table type select symbol1, AVG(price_open) as avg_price from nyse group by symbol1;
, we can see the average opening price of each stock
hive
. Run the below commands in Hive.Use your database. Run the following command. ${env:USER}
gets replaced by your username automatically:
use ${env:USER};
To select all columns from the table nyse
SELECT * FROM nyse;
To select only required columns from the table nyse
SELECT exchange1, symbol1 FROM nyse;
Find the average opening price for each stock
SELECT symbol1, AVG(price_open) AS avg_price FROM nyse
GROUP BY symbol1;
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Please login to comment
4 Comments
I am still unable to run aggregations. :(
Hi Renan,
It seems that the issue is due to your disk space quota being exceeded. You can refer to https://discuss.cloudxlab.com/t/my-user-disk-space-quota-in-the-lab-has-exceeded-how-can-i-clean-the-unnecessary-files/5370. This guide will help you clear up unnecessary files and free up space.
Upvote ShareGetting below error.
hive> select symbol1, AVG(price_open)as avg_price FROM nyse GROUP BY symbol1;
Upvote ShareQuery ID = pranavsri3896_20230815115714_c4ab593d-1a79-4cbe-b184-2ad8d160f868
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1648130833540_32859, Tracking URL = http://cxln2.c.thelab-240901.internal:8088/proxy/application_1648130833540_32859/
Kill Command = /usr/hdp/2.6.2.0-205/hadoop/bin/hadoop job -kill job_1648130833540_32859
Interrupting... Be patient, this might take some time.
Press Ctrl+C again to kill JVM
killing job with: job_1648130833540_32859
Hadoop job information for Stage-1: number of mappers: 0;number of reducers: 0
2023-08-15 11:57:18,879 Stage-1 map = 0%, reduce = 0%
Ended Job = job_1648130833540_32859 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
Hi,
It is working fine from my end. Can you try it again?
Upvote Share