Pig & Pig Latin

20 / 48

Pig - Relational Operators - Load, Store and Dump

Not able to play video? Try with vimeo

[Pig - Load]

LOAD operator loads the data from the file system.

  1. To load the NYSE_dividends dataset from HDFS, type

load '/data/NYSE_dividends';

Tab will be the default separator if we do not specify a separator while loading the data. Pig loads the values and automatically guesses the datatype

  1. We can explicitly define the separator using the PigStorage function. To load the CSV file, type the command

load '/data/NYSE_dividends' using PigStorage(',');

  1. We can also define data types explicitly. We can define name as chararray, stock_symbol as chararray, date as datetime and dividends as float

[Pig - Store / Dump]

Store operator is used to store the data to HDFS and other storages.

Dump prints the value on the screen. It is used for debugging.


divs = LOAD '/data/NYSE_dividends';
divs = LOAD '/data/NYSE_dividends' USING PigStorage(',');
divs = LOAD '/data/NYSE_dividends' AS (name: chararray, stock_symbol: chararray, date: datetime, dividend: float);

Loading comments...