Course on
Big Data with Hadoop

Learn HDFS, ZooKeeper, Hive, HBase, NoSQL, Oozie, Flume and Sqoop From Industry Experts

(9,025 Learners)

  50+ hours training

  Projects & Lab

  24x7 Support

  Compatible with Hortonworks and Cloudera Certifications

About the Course

As humans, we are immersed in data in our every-day lives. As per IBM, the data doubles every two years on this planet. The value that data holds can only be understood when we can start to identify patterns and trends in the data. Normal computing principles do not work when data becomes huge.

There is massive growth in the big data space, and job opportunities are skyrocketing, making this the perfect time to launch your career in this space.

In this course, you will learn Hadoop to drive better business decisions and solve real-world problems.

1 course

Learn from industry experts. Follow the suggested order or choose your own.

Projects & Lab

Apply the skills you learn on a distributed cluster to solve real-world problems.


Highlight your new skills on your resume or LinkedIn.

1:1 Mentoring

Subscribe to 1:1 mentoring sessions and get guidance from industry leaders and professionals.

Best-in-class Support

24×7 support and forum access to answer all your queries throughout your learning journey.
Learning Path


Big Data with Hadoop

About the Course

Hardware and Software requirements
Course requires a good internet (1 Mbps or more) and a browser to watch videos and do hands-on the lab. We've configured all the tools in the lab so that you can focus on learning and practicing in a real-world cluster.

What is Big Data?

Why Now?

Big Data Use Cases

Various Solutions

Overview of Hadoop Ecosystem

Spark Ecosystem Walkthrough


Understanding the CloudxLab

CloudxLab Hands-On

Hadoop & Spark Hands-on

Quiz and Assessment

Basics of Linux - Quick Hands-On

Understanding Regular Expressions

Quiz and Assessment

Setting up VM (optional)

Why Do we need it?

Understanding Data Model


Quiz & Assessment

How does election happen - Paxos Algorithm?

Use cases

When not to use

Quiz & Assessment

Why HDFS or Why not existing file systems?

Understanding the architecture


Advance HDFS Concepts (HA, Federation)


Hands-on with HDFS (Upload, Download, SetRep)

Quiz & Assessment

Data Locality (Rack Awareness)

Computing - Why not existing tools?

MapReduce 1.0

Resource Management: YARN Architecture

Advance Concepts - Speculative Execution


Why MapReduce?

Understanding MapReduce Framework


Example 0 - Word Frequency Problem - Without MR

Example 1 - Only Mapper - Image Resizing

Example 2 - Word Frequency Problem

Example 3 - Temperature Problem

Example 4 - Multiple Reducer

Example 5 - Java MapReduce Walkthrough


Example 6 - Secondary Sorting (Word Recommendation)

Example 7 - Partitioner

Concept - Associative & Commutative


Example 8 - Combiner

Example 9 - Hadoop Streaming

Example 10 - Adv. Problem Solving - Anagrams

Example 11 - Adv. Problem Solving - Same DNA

Example 12 - Adv. Problem Solving - Similar DNA

Example 12 - Joins - Voting

Limitations of MapReduce


Why Pig?

Basic Structure of Pig Latin

Getting Started

Example - NYSE Stock Exchange

Concept - Lazy Evaluation

Why Hive?

Hive Architecture Overview

Getting Started

Loading Data in Hive (Tables)

Example: Movielens Data Processing

Advance Concepts: Views

Connecting Tableau and HiveServer 2

Connecting Microsoft Excel and HiveServer 2

Project: Sentiment Analyses of Twitter Data

Advanced - Partition Tables

Understanding HCatalog & Impala


Case Study: The days before NoSQL

What is NoSQL?

CAP Theorem

HBase Architecture - Region Servers etc

Hbase Data Model - Column Family Orientedness

Getting Started - Create table, Adding Data

Adv Example - Google Links Storage

Concept - Bloom Filter

Comparison of NOSQL Databases


Sqoop Overview

Import From MySQL to HDFS, Hive, HBase

Exporting to MySQL from HDFS

Concept - Unbounding Dataset Processing or Stream Processing

Flume Overview: Agents - Source, Sink, Channel

Example 1 - Data from Local network service into HDFS

Example 2 - Extracting Twitter Data


Example 3 - Creating workflow with Oozie


Earn your certificate

Our course is exhaustive and the certificate rewarded by us is proof that you have taken a big leap in Big Data domain.

Differentiate yourself

The knowledge you have gained from working on projects, videos, quizzes, hands-on assessments and case studies gives you a competitive edge.

Share your achievement

Highlight your new skills on your resume, LinkedIn, Facebook and Twitter. Tell your friends and colleagues about it.

 Course Certificate Sample
Self-paced Learning

Learn at your pace

99 149

High-quality videos, slides, hands-on examples, quizzes, automated assessments, case studies, real-world projects

Lifetime access to cutting-edge self-paced learning content

90 days of lab access for hands-on practice

24x7 support to answer your queries

Earn certificate in Big Data with Hadoop

Enroll Now
Sandeep Giri

Sandeep Giri

Founder at CloudxLab

Past - Amazon, InMobi, tBits Global, D.E.Shaw

For last 15 years, Sandeep has been building products and churning large amounts of data for various product companies. He has an all-around experience of software development and big data analysis.

Apart from digging data and technologies, Sandeep enjoys conducting interviews and explaining difficult concepts in simple ways.

Course Creators
Abhinav Singh

Abhinav Singh

Co-Founder at CloudxLab, Past- Byjus
Course Developer
Abhishek Agarwal

Abhishek Agarwal

Sales & Marketing Analyst - CloudxLab
Program Manager
 Jatin Shah

Jatin Shah

LinkedIn, Yahoo, Yale CS Ph.D.
Course Advisor


It will take 2-3 months with 6-8 hours of effort per week.
We understand that you might need course material for a longer duration to make most out of your subscription. You will get lifetime access to the course material so that you can refer to the course material anytime
At the end, of course, you will work on a real-time project. You will receive a problem statement along with a data-set to work on CloudxLab. Once you are done with the project (it will be reviewed by an expert), you will be awarded a certificate which you can share on LinkedIn.
We will provide 90 days of access to CloudxLab so that you learn by practice in a real time environment.
Yes. Java is generally required for understanding MapReduce. MapReduce is a programming paradigm for writing your logic in the form of Mapper and reducer functions. We provide a self-paced course on Java for free. As soon as you signup, it would be available in your account section.
For self-paced course, we provide 100% fees refund if the request is raised within 7 days from enrolment date. Thereafter, no refund is provided.
For instructor-led course, we provide 100% refund if not more than 1 live session has been conducted -- and we provide 50% refund if 2-4 live sessions have been conducted. If 5 or more live sessions have been conducted, then no refund will be provided.

Have more questions? Please contact us at