Course on
Big Data with Hadoop

Learn HDFS, ZooKeeper, Hive, HBase, NoSQL, Oozie, Flume and Sqoop From Industry Experts

(9,025 Learners)

  25+ hours training

  Projects & Lab

  24x7 Support

  Compatible with Hortonworks and Cloudera Certifications

About the Course

As humans, we are immersed in data in our every-day lives. As per IBM, the data doubles every two years on this planet. The value that data holds can only be understood when we can start to identify patterns and trends in the data. Normal computing principles do not work when data becomes huge.

There is massive growth in the big data space, and job opportunities are skyrocketing, making this the perfect time to launch your career in this space.

In this course, you will learn Hadoop to drive better business decisions and solve real-world problems.

1 course

Learn from industry experts. Follow the suggested order or choose your own.

Projects & Lab

Apply the skills you learn on a distributed cluster to solve real-world problems.


Highlight your new skills on your resume or LinkedIn.

1:1 Mentoring

Subscribe to 1:1 mentoring sessions and get guidance from industry leaders and professionals.

Best-in-class Support

24×7 support and forum access to answer all your queries throughout your learning journey.


Compatible with: CCP Data Engineer, CCA Data Analyst, HDPCD - Hortonworks Certified Developer
Learning Path


Big Data with Hadoop

About the Course

Hardware and Software requirements
Course requires a good internet (1 Mbps or more) and a browser to watch videos and do hands-on the lab. We've configured all the tools in the lab so that you can focus on learning and practicing in a real-world cluster.

What is Big Data?

Why Now?

Big Data Use Cases

Various Solutions

Overview of Hadoop Ecosystem

Spark Ecosystem Walkthrough


Understanding the CloudxLab

CloudxLab Hands-On

Hadoop & Spark Hands-on

Quiz and Assessment

Basics of Linux - Quick Hands-On

Understanding Regular Expressions

Quiz and Assessment

Setting up VM (optional)

Why Do we need it?

Understanding Data Model


Quiz & Assessment

How does election happen - Paxos Algorithm?

Use cases

When not to use

Quiz & Assessment

Why HDFS or Why not existing file systems?

Understanding the architecture


Advance HDFS Concepts (HA, Federation)


Hands-on with HDFS (Upload, Download, SetRep)

Quiz & Assessment

Data Locality (Rack Awareness)

Computing - Why not existing tools?

MapReduce 1.0

Resource Management: YARN Architecture

Advance Concepts - Speculative Execution


Why MapReduce?

Understanding MapReduce Framework


Example 0 - Word Frequency Problem - Without MR

Example 1 - Only Mapper - Image Resizing

Example 2 - Word Frequency Problem

Example 3 - Temperature Problem

Example 4 - Multiple Reducer

Example 5 - Java MapReduce Walkthrough


Example 6 - Secondary Sorting (Word Recommendation)

Example 7 - Partitioner

Concept - Associative & Commutative


Example 8 - Combiner

Example 9 - Hadoop Streaming

Example 10 - Adv. Problem Solving - Anagrams

Example 11 - Adv. Problem Solving - Same DNA

Example 12 - Adv. Problem Solving - Similar DNA

Example 12 - Joins - Voting

Limitations of MapReduce


Why Pig?

Basic Structure of Pig Latin

Getting Started

Example - NYSE Stock Exchange

Concept - Lazy Evaluation

Why Hive?

Hive Architecture Overview

Getting Started

Loading Data in Hive (Tables)

Example: Movielens Data Processing

Advance Concepts: Views

Connecting Tableau and HiveServer 2

Connecting Microsoft Excel and HiveServer 2

Project: Sentiment Analyses of Twitter Data

Advanced - Partition Tables

Understanding HCatalog & Impala


Case Study: The days before NoSQL

What is NoSQL?

CAP Theorem

HBase Architecture - Region Servers etc

Hbase Data Model - Column Family Orientedness

Getting Started - Create table, Adding Data

Adv Example - Google Links Storage

Concept - Bloom Filter

Comparison of NOSQL Databases


Sqoop Overview

Import From MySQL to HDFS, Hive, HBase

Exporting to MySQL from HDFS

Concept - Unbounding Dataset Processing or Stream Processing

Flume Overview: Agents - Source, Sink, Channel

Example 1 - Data from Local network service into HDFS

Example 2 - Extracting Twitter Data


Example 3 - Creating workflow with Oozie


Earn your certificate

Our course is exhaustive and the certificate rewarded by us is proof that you have taken a big leap in Big Data domain.

Differentiate yourself

The knowledge you have gained from working on projects, videos, quizzes, hands-on assessments and case studies gives you a competitive edge.

Share your achievement

Highlight your new skills on your resume, LinkedIn, Facebook and Twitter. Tell your friends and colleagues about it.

 Course Certificate Sample
Self-paced Learning
Learn at your pace

Instructor-led Trainings

26 Feb , Mon, Tue, Wed, Thu, Fri
9:30 a.m. - 11:30 a.m. America/New_York

199 349
29 Apr , Sun, Sat
10:30 a.m. - 11:30 a.m. America/New_York

199 349
Course Creators
Sandeep Giri

Sandeep Giri

Founder at CloudxLab, Past- Amazon, InMobi, D.E.Shaw
Course Developer
Abhinav Singh

Abhinav Singh

Co-Founder at CloudxLab, Past- Byjus
Course Developer
Rohit Gupta

Rohit Gupta

CIO - MedGenome, PhD University of Minnesota-Twin Cities
Course Advisor
 Jatin Shah

Jatin Shah

LinkedIn, Yahoo, Yale CS Ph.D.
Course Advisor



40 reviews
(4.9 out of 5)

Must have for practicing and perfecting hadoop. To setup in PC you need to have a very high end configuration and setup will be pseudo node setup.. For better understanding I recomend CloudxLab


They are great. They take care of all the Big Data technologies (Hadoop, Spark, Hive, etc.) so you do not have to worry about installing and running them correclty on your pc. Plus, they have a fantastic customer support. Even when I have had problems debugging my own programs, they have answered me with the correct solution in a few hours, and all of this for a more than reasonable price. I personally recommend it to everyone :)


I have been using CloudxLab for last 3 months for learning Hadoop and Spark, and I can vouch for it.

It’s a platform where you can learn from the tutorial videos and then practice in the lab they provide on cloud. The study materials are well-planned and I would be lying if I say its not great.
The video lectures explains the technical stuffs in very simple ways which makes it easier to grasp the concepts. Also, the customer service is great.
So, thumbs up for the team associated with CloudxLab.
To conclude my views, I would just say that, if you are willing to learn Big Data related stuff, I strongly recommend CloudxLab.


I think I can give some points on this . Am using cloudxlab for more than an year… my intention is for continuous learning.
For Students and technology change professionals :
In General Big data hadoop, (a) you can learn on your personal PC, but for that the minimum configuration of 12 GB Ram with good processing speed, still when you execute jobs it will take more time for processing jobs as it will be acting as single node.(b) If you try to install each and every components, it will take hell a lot of admin work , and some thing happens , you have to invest lot of time for debugging.
The main advantage of using cloudxlab,
a) Get 6 node production cluster with all installed components, just getting user and password, you can start working on it.
b) You have almost all the access.
c) Good amount of components installed.
d) You can play around with each of them with 5gb of test data.
e) So far I didnt experience any down time.
f) You can Practice in your college lab, on free time.
g) Good email support on technical perspective.
h) They have couple of test data, I use my own.
i) vi and nano editor supported.
j) Some of the components which I remember are HDFS,MapReduce2, YARN, Tez, ZooKeeper,Falcon,Storm, Kafka,Spark,Jupyter Notebook, Hive,HBase, Pig, Sqoop, Oozie, Flume,Accumulo,Ambari.


I have been using CloudxLab for sometime and based on my usage experience I can say that they have done a fabulous job.

The first problem anyone faces while learning Big Data technologies is running the VMs on his/her laptop. VMs require a good amount of dedicated RAM and so most of the times we end up spending in hardware upgrade. But even after an upgrade the requirement of a cluster is never met. The examples we try alaways runs on a single node setup.

To try this on a production like cluster setup we have something like AWS, but there is a good amount of cost involved in that. Also, they keep the credit card details with them which I feel not everyone feels safe to share.

And this is where I feel CloudxLab seems to be a better bet.Their pricing is very much competitive compared to the other offerings and also it doesn’t require any specific hardware requirement. Any desktop/laptop with any configuration which has connectivity to net is good for getting started.

No need to do any setups.Their clusters are fully loaded with all the latest Big Data packages.You can access them from anywhere.

The only thing you need to concentrate is on your learning :)

Hope this helps to anyone who is looking for an option beyond VMs.

It will take 2-3 months with 6-8 hours of effort per week.
The instructors for this course are industry experts having years of experience in mentoring students across the world.
We understand that you might need course material for a longer duration to make most out of your subscription. You will get lifetime access to the course material so that you can refer to the course material anytime
At the end, of course, you will work on a real-time project. You will receive a problem statement along with a data-set to work on CloudxLab. Once you are done with the project (it will be reviewed by an expert), you will be awarded a certificate which you can share on LinkedIn.
We will provide 90 days of access to CloudxLab so that you learn by practice in a real time environment.
Yes. Java is generally required for understanding MapReduce. MapReduce is a programming paradigm for writing your logic in the form of Mapper and reducer functions. We provide a self-paced course on Java for free. As soon as you signup, it would be available in your account section.
For self-paced course, we provide 100% fees refund if the request is raised within 7 days from enrolment date. Thereafter, no refund is provided.
For instructor-led course, we provide 100% refund if not more than 1 live session has been conducted -- and we provide 50% refund if 2-4 live sessions have been conducted. If 5 or more live sessions have been conducted, then no refund will be provided.

Have more questions? Please contact us at