[Pig - Modes]
Pig can be run in two modes:
MapReduce mode - In this mode, Pig loads and processes the data stored on HDFS. Pig Latin statements invoke a MapReduce job to perform the processing. It is the recommended mode in a production environment.
Local mode - In this mode, Pig accesses files stored on the local file system. Data processing happens on the local machine. This mode is generally used for testing locally and speeding up development.
[Pig - MapReduce Mode] [Screencast pig_modes.mp4]
Let's launch Pig on CloudxLab. Login to the CloudxLab Linux console. Type pig on the command prompt. By default, Pig gets launched in MapReduce mode. Afer successfully launching Pig, Grunt shell appears. The Grunt shell of Apache Pig is mainly used to write Pig Latin scripts. We can control Hadoop from Grunt shell by running fs commands. We can also kill the jobs and execute pig scripts from grunt shell.
To see files in your home directory in HDFS, type ls. You can see all the files stored in your home directory in HDFS. Press "Control-D" to exit the shell.
[Pig - Local Mode]
Let's launch Pig in local mode on CloudxLab. Type "pig -x local". To see files in your home directory, type ls.