26 / 48

Pig - Calculate Average Dividend - Hands-on

Not able to play video? Try with vimeo

Code

``````divs = LOAD '/data/NYSE_dividends' AS (exchange, stock_symbol, date, dividends);
grped = GROUP divs BY stock_symbol;
DUMP grped;
avged = FOREACH grped GENERATE  group,  AVG(divs.dividends);
STORE avged INTO 'avged';
ls avged
``````

Upon doing `ls avged`, we could see something as follows:

``````grunt> ls avged
hdfs://cxln1.c.thelab-240901.internal:8020/user/vagdevi4768/avged/_SUCCESS<r 3>     0
hdfs://cxln1.c.thelab-240901.internal:8020/user/vagdevi4768/avged/part-v001-o000-r-00000<r 3>       1863
``````

Then, we can see the contents of the file `part-v001-o000-r-00000` as follows:

``````cat avged/part-v001-o000-r-00000
``````

Description

``````grunt> divs = LOAD '/data/NYSE_dividends' AS (exchange, stock_symbol, date, dividends);
grunt> describe divs
divs: {exchange: bytearray,stock_symbol: bytearray,date: bytearray,dividends: bytearray}
grunt>
``````

After loading, `divs` basically represent a dataset in which each row is having two columns `exchange` and `stock_symbol`.

``````grunt> grped = GROUP divs BY stock_symbol;
grunt> describe grped;
grped: {group: bytearray,divs: {(exchange: bytearray,stock_symbol: bytearray,date: bytearray,dividends: bytearray)}}
``````

`grped` contains rows where each row is having two columns/fields. first one is the group name and second is a list of divs.

``````grunt> avged = FOREACH grped GENERATE  group,  AVG(divs.dividends);
grunt> describe avgedavged: {group: bytearray,double}
grunt>
``````

`avged` has some number of rows as `grped` just the values in the second column are aggregated using `AVG` function.