Course on Big Data with Hadoop & Spark

Learn From Industry Experts With 1:1 Mentoring & Live QnA Sessions

Enroll Now!

Why learn Big Data?

As humans, we are immersed in data in our every-day lives. As per IBM, the data doubles every two years on this planet. The value that data holds can only be understood when we can start to identify patterns and trends in the data. Normal computing principles do not work when data becomes huge.

There is massive growth in the big data space, and job opportunities are skyrocketing, making this the perfect time to launch your career in this space.

Discover more salary details for Big Data Engineer. Browse salaries by job title, company, location, school on Paysa

Program Syllabus

Preview Course

Student Rating

View all reviews


2-3 Months

Skill Level  



  • What is Big Data?
  • Why Now?
  • Big Data Use Cases
  • Various Solutions
  • Overview of Hadoop Ecosystem
  • Spark Ecosystem Walkthrough
  • Quiz

Foundation & Environment

  • Understanding the CloudxLab
  • CloudxLab Hands-On
  • Hadoop & Spark Hands-on
  • Quiz and Assessment
  • Basics of Linux - Quick Hands-On
  • Understanding Regular Expressions
  • Quiz and Assessment
  • Setting up VM (optional)


  • Why Do we need it?
  • Understanding Data Model
  • Hands-On
  • Quiz & Assessment
  • How does election happen - Paxos Algorithm?
  • Use cases
  • When not to use
  • Quiz & Assessment


  • Why HDFS or Why not existing file systems?
  • Understanding the architecture
  • Quiz
  • Advance HDFS Concepts (HA, Federation)
  • Quiz
  • Hands-on with HDFS (Upload, Download, SetRep)
  • Quiz & Assessment
  • Data Locality (Rack Awareness)

Data Formats & Management

  • InputFormat and InputSplit
  • JSON
  • XML
  • AVRO
  • How to store many small files - SequenceFile?
  • Parquet
  • Protocol Buffers
  • Comparing Compressions
  • Understanding Row Oriented and Column Oriented Formats - RCFile?


  • Computing - Why not existing tools?
  • MapReduce 1.0
  • Resource Management: YARN Architecture
  • Advance Concepts - Speculative Execution
  • Quiz

MapReduce Basics

  • Why MapReduce?
  • Understanding MapReduce Framework
  • Quiz
  • Example 0 - Word Frequency Problem - Without MR
  • Example 1 - Only Mapper - Image Resizing
  • Example 2 - Word Frequency Problem
  • Example 3 - Temperature Problem
  • Example 4 - Multiple Reducer
  • Example 5 - Java MapReduce Walkthrough
  • Quiz

MapReduce Advanced

  • Example 6 - Secondary Sorting (Word Recommendation)
  • Example 7 - Partitioner
  • Concept - Associative & Commutative
  • Quiz
  • Example 8 - Combiner
  • Example 9 - Hadoop Streaming
  • Example 10 - Adv. Problem Solving - Anagrams
  • Example 11 - Adv. Problem Solving - Same DNA
  • Example 12 - Adv. Problem Solving - Similar DNA
  • Example 12 - Joins - Voting
  • Limitations of MapReduce
  • Quiz

Analyzing Data with Pig

  • Why Pig?
  • Basic Structure of Pig Latin
  • Getting Started
  • Example - NYSE Stock Exchange
  • Concept - Lazy Evaluation

Processing Data with Hive

  • Why Hive?
  • Hive Architecture Overview
  • Getting Started
  • Loading Data in Hive (Tables)
  • Example: Movielens Data Processing
  • Advance Concepts: Views
  • Connecting Tableau and HiveServer 2
  • Connecting Microsoft Excel and HiveServer 2
  • Project: Sentiment Analyses of Twitter Data
  • Advanced - Partition Tables
  • Understanding HCatalog & Impala
  • Quiz

NoSQL and HBase

  • Case Study: The days before NoSQL
  • What is NoSQL?
  • CAP Theorem
  • HBase Architecture - Region Servers etc
  • Hbase Data Model - Column Family Orientedness
  • Getting Started - Create table, Adding Data
  • Adv Example - Google Links Storage
  • Concept - Bloom Filter
  • Comparison of NOSQL Databases
  • Quiz

Importing Data with Sqoop and Flume, Oozie

  • Sqoop Overview
  • Import From MySQL to HDFS, Hive, HBase
  • Exporting to MySQL from HDFS
  • Concept - Unbounding Dataset Processing or Stream Processing
  • Flume Overview: Agents - Source, Sink, Channel
  • Example 1 - Data from Local network service into HDFS
  • Example 2 - Extracting Twitter Data
  • Quiz
  • Example 3 - Creating workflow with Oozie

Scala Basics

  • Introduction to Scala?
  • Accessing Scala using CloudxLab
  • Getting Started: Interactive, Compilation, SBT
  • Types, Variables & Values
  • Functions
  • Collections
  • Classes
  • Parameters
  • More Features
  • Quiz and Assessment

Spark Basics

  • What is Apache Spark?
  • Why Spark?
  • Using the Spark Shell on CloudxLab
  • Example 1 - Performing Word Count
  • Understanding Spark Cluster Modes on YARN
  • RDDs (Resilient Distributed Datasets)
  • General RDD Operations: Transformations & Actions
  • RDD Lineage
  • RDD Persistence Overview
  • Distributed Persistence

Writing and Deploying Spark Applications

  • Creating the SparkContext
  • Building a Spark Application (Scala, Java, Python)
  • The Spark Application Web UI
  • Configuring Spark Properties
  • Running Spark on Cluster
  • RDD Partitions
  • Executing Parallel Operations
  • Stages and Tasks
  • Project: Churning the logs of NASA Kennedy Space Center WWW server

Common Patterns in Spark Data Processing

  • Common Spark Use Cases
  • Example 1 - Data Cleaning (Movielens)
  • Example 2 - Understanding Spark Streaming
  • Understanding Kafka
  • Example 3 - Spark Streaming from Kafka
  • Iterative Algorithms in Spark
  • Project: Real-time analytics of orders in an e-commerce company

DataFrames and Spark SQL

  • Spark SQL and the SQL Context
  • Creating DataFrames
  • Transforming and Querying DataFrames
  • Saving DataFrames
  • DataFrames and RDDs
  • Comparing Spark SQL, Impala, and Hive-on-Spark

Machine Learning with Spark

  • GraphX: Graph Processing and Analysis
  • Understanding Machine Learning
  • MlLib Example: k-means
  • SparkR Example


How will you benefit?

Skill Enhancement

Develop skills and competencies to excel and stand out in the Big Data domain

Career Growth

Get better roles and better packages

What do I get?

  • Life time access to course material

    Get Lifetime access to course videos so that you can learn at your own pace.

  • 90 days of CloudxLab access

    Learn by practicing on a real time distributed environment.

  • QnA sessions

    Subscribe to weekly QnA webinars with a group of our course learners. Learn more

  • Best-in-class support

    24x7 email support to answer your queries. Get answer to your queries in one business day.

  • Training by professionals

    Learn from best-in-class industry professionals having years of experience in churning Big Data.

  • Verified certificate

    Receive verified certificate and share it on LinkedIn.

  • 1:1 Mentoring

    Subscribe to 1:1 mentoring sessions and get guidance from industry leaders and professionals. Learn more


Learning with CloudxLab means getting exactly where you want to be in your career.

Self-Paced Learning

Program Syllabus

Master cutting-edge skills sought by leading companies


90 Days of CloudxLab access


24x7 email support to answer your queries


Earn Certificate In Big Data with Hadoop & Apache Spark

$ 399   $ 199

Self-Paced Learning + Live QnA

Program Syllabus

Master cutting-edge skills sought by leading companies


90 Days of CloudxLab access


24x7 email support to answer your queries


Earn Certificate In Big Data with Hadoop & Apache Spark

Live QnA sessions

40 hours of live QnA sessions
$ 799   $ 499

Prerequisites and Requirements

  • Basics Of SQL. You should know the basics of SQL and databases. If you know about filters in SQL, you are expected to understand the course.

  • A know-how of the basics of programming. If you understand 'loops' in any programming language, and if you are able to create a directory and see what's inside a file from the command line, you are good to get the concepts of this course even if you have not really touched programming for the last 10 years! In addition, we will be providing video classes on the basics of Python and Scala.


Common questions and answers

  • How much time will it take to complete the course?

    It will take 2-3 months with 6-8 hours of effort per week.

  • How does QnA session work?

    Once the course is launched, we will be scheduling weekly QnA webinars with a group of our course learners on a particular course topic.

    It is a premium service where you can pay for a topic in which you have doubts or for all the topics in one go. We will provide more information on this premium service after the course is launched.

  • How does 1:1 Mentoring work?

    We offer mentoring sessions to our learners with industry leaders and professionals so you can get 1 on 1 help with any questions you may have, whether your questions are technical, job-related or anything else.

    It is a paid service and exclusively available to learners enrolling for the course. We will provide more information on subscription information for the same after the course is launched.

  • What is the certification process?

    At the end of your course, you will work on a real time project. You will receive a Problem Statement along with a data-set to work on our CloudxLab. Once you are successfully through the project (Reviewed by an expert), you will be awarded a certificate which you can share on LinkedIn.

  • How will be the practicals or hands-on be conducted?

    We will provide 90 days of access to CloudxLab so that you learn by practice in a real time enviornment.

  • I am not from a Java Background. Can I take this course?

    Yes. Java is generally required for understanding MapReduce. MapReduce is a programming paradigm for writing your logic in the form of Mapper and reducer functions. We provide a self paced course on Java for free. As soon as you signup, it would be available in your account section.

  • I have some more questions. Can I talk to someone?

    Absolutely! Please contact us at

Program Leads

Course Instructor
Sandeep GiriCourse Instructor
Course Developer
Abhinav SinghCourse Developer
Course Developer
Benjamin BertincourtCourse Advisor
Course Advisor
Jatin ShahCourse Advisor
Course Advisor
Amit UpadhyayCourse Advisor
Course Advisor
Ratnaker PandeyCourse Advisor