Pig & Pig Latin

26 / 48

Pig - Calculate Average Dividend - Hands-on




Not able to play video? Try with youtube

Code

divs = LOAD '/data/NYSE_dividends' AS (exchange, stock_symbol, date, dividends);
grped = GROUP divs BY stock_symbol;
DUMP grped;
avged = FOREACH grped GENERATE  group,  AVG(divs.dividends);
STORE avged INTO 'avged';
ls avged

Upon doing ls avged, we could see something as follows:

grunt> ls avged
hdfs://cxln1.c.thelab-240901.internal:8020/user/vagdevi4768/avged/_SUCCESS<r 3>     0
hdfs://cxln1.c.thelab-240901.internal:8020/user/vagdevi4768/avged/part-v001-o000-r-00000<r 3>       1863

Then, we can see the contents of the file part-v001-o000-r-00000 as follows:

cat avged/part-v001-o000-r-00000

Description

grunt> divs = LOAD '/data/NYSE_dividends' AS (exchange, stock_symbol, date, dividends);
grunt> describe divs
divs: {exchange: bytearray,stock_symbol: bytearray,date: bytearray,dividends: bytearray}
grunt>

After loading, divs basically represent a dataset in which each row is having two columns exchange and stock_symbol.

grunt> grped = GROUP divs BY stock_symbol;
grunt> describe grped;
grped: {group: bytearray,divs: {(exchange: bytearray,stock_symbol: bytearray,date: bytearray,dividends: bytearray)}}

grped contains rows where each row is having two columns/fields. first one is the group name and second is a list of divs.

grunt> avged = FOREACH grped GENERATE  group,  AVG(divs.dividends);
grunt> describe avgedavged: {group: bytearray,double}
grunt>

avged has some number of rows as grped just the values in the second column are aggregated using AVG function.


Loading comments...