Understanding Big Data Stack - Apache Hadoop and Spark

2 / 4

What is Apache Hadoop?




Not able to play video? Try with youtube

Hadoop was created by Doug Cutting in order to build his search engine called Nutch. He was joined by Mike Cafarella. Hadoop was based on the three papers published by Google: Google File System, Google MapReduce, and Google Big Table.

It is named after the toy elephant of Doug Cutting's son.

Hadoop is under Apache license which means you can use it anywhere without having to worry about licensing.

It is quite powerful, popular and well supported.

It is a framework to handle Big Data.

Started as a single project, Hadoop is now an umbrella of projects. All of the projects under the Apache Hadoop umbrella should have followed three characteristics: 1. Distributed - They should be able to utilize multiple machines in order to solve a problem. 2. Scalable - If needed it should be very easy to add more machines. 3. Reliable - If some of the machines fail, it should still work fine. These are the three criteria for all the projects or components to be under Apache Hadoop.

Hadoop is written in Java so that it can run on kinds of devices.


Loading comments...