Pig & Pig Latin

1 / 48
Pig - Introduction
 

In this video, we will learn Pig.

[Pig - Introduction]

Pig is an abstraction over MapReduce. Pig is used to analyze large data by the way of data flows.

To analyze data, Pig provides a high-level language known as Pig Latin. Pig has a component known as Pig Engine. It converts Pig Latin into MapReduce jobs.

Pig can be used with and without Hadoop. When running on Hadoop, Pig makes use of HDFS and MapReduce

[Pig - Why do we need it?]

Before Pig, programmers not proficient with Java had to struggle to work with Hadoop and to write MapReduce tasks. Pig Latin makes writing and maintaining code easier by providing many relational operations such as 'order by', 'group by' and 'Joins' which are hard to write in MapReduce.

[Pig - Use Cases]

Pig can be used for analyzing data, in Interactive processing, batch processing, and ETL - Extract, Transform and Load.

[Slide Pig - Philosophy]

  • Pigs eat anything. Pig can process structured, semi-structured and unstructured data

  • Pigs live anywhere. Pig can be used with or without Hadoop to analyze data.

  • Pigs are domestic animals. We can control the data flow and write our own custom functions in Java and Python.

  • Pigs fly. Pig is designed for performance by providing many built-in optimizations.

[Pig Latin - A Data Flow Language]

Pig scripts are written in Pig Latin. Pig Latin allows us to describe how data should flow, read, be processed and stored to multiple outputs. It also helps in writing complex workflows like performing multiple joins etc.