Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left
Apply NowLogin using Social Account
     Continue with GoogleLogin using your credentials
Code
divs = LOAD '/data/NYSE_dividends' AS (exchange, stock_symbol, date, dividends);
grped = GROUP divs BY stock_symbol;
DUMP grped;
avged = FOREACH grped GENERATE group, AVG(divs.dividends);
STORE avged INTO 'avged';
ls avged
Upon doing ls avged
, we could see something as follows:
grunt> ls avged
hdfs://cxln1.c.thelab-240901.internal:8020/user/vagdevi4768/avged/_SUCCESS<r 3> 0
hdfs://cxln1.c.thelab-240901.internal:8020/user/vagdevi4768/avged/part-v001-o000-r-00000<r 3> 1863
Then, we can see the contents of the file part-v001-o000-r-00000
as follows:
cat avged/part-v001-o000-r-00000
Description
grunt> divs = LOAD '/data/NYSE_dividends' AS (exchange, stock_symbol, date, dividends);
grunt> describe divs
divs: {exchange: bytearray,stock_symbol: bytearray,date: bytearray,dividends: bytearray}
grunt>
After loading, divs
basically represent a dataset in which each row is having two columns exchange
and stock_symbol
.
grunt> grped = GROUP divs BY stock_symbol;
grunt> describe grped;
grped: {group: bytearray,divs: {(exchange: bytearray,stock_symbol: bytearray,date: bytearray,dividends: bytearray)}}
grped
contains rows where each row is having two columns/fields. first one is the group name and second is a list of divs.
grunt> avged = FOREACH grped GENERATE group, AVG(divs.dividends);
grunt> describe avgedavged: {group: bytearray,double}
grunt>
avged
has some number of rows as grped
just the values in the second column are aggregated using AVG
function.
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Loading comments...