Data Science Specialization by E&ICT, IIT Roorkee for $159 | Expires in

 Enroll Now

Big Data with Hadoop Training Online Course (With Lab Access)

Learn HDFS, ZooKeeper, Hive, HBase, NoSQL, Oozie, Flume and Sqoop From Industry Experts

3,768 Ratings       9,025 learners

  30+ hours training

  90 days of Lab

  24x7 Support

Projects

  Compatible with Hortonworks, Cloudera Certifications

About the Course

As humans, we are immersed in data in our every-day lives. As per IBM, the data doubles every two years on this planet. The value that data holds can only be understood when we can start to identify patterns and trends in the data. Normal computing principles do not work when data becomes huge.

There is massive growth in the big data space, and job opportunities are skyrocketing, making this the perfect time to launch your career in this space.

In this course, you will learn Hadoop to drive better business decisions and solve real-world problems.



1 course

Learn from industry experts. Follow the suggested order or choose your own.

Projects & Lab

Apply the skills you learn on a distributed cluster to solve real-world problems.

Certificate

Highlight your new skills on your resume or LinkedIn.

Best-in-class Support

24×7 support and forum access to answer all your queries throughout your learning journey.

Certifications

Compatible with: CCP Data Engineer, CCA Data Analyst, HDPCD - Hortonworks Certified Developer
Enrollment

Self-Paced Learning

Start Immediately
Learn at your pace
90 days lab

Learning Path
Download Course Syllabus

Course

Big Data with Hadoop

About the Course


This course is a part of the Specialization Course in Big Data with Hadoop and Spark

What is Big Data?

Why Now?

Big Data Use Cases

Various Solutions

Overview of Hadoop Ecosystem

Spark Ecosystem Walkthrough

Quiz

Understanding the CloudxLab

CloudxLab Hands-On

Hadoop & Spark Hands-on

Quiz and Assessment

Basics of Linux - Quick Hands-On

Understanding Regular Expressions

Quiz and Assessment

Setting up VM (optional)

Why Do we need it?

Understanding Data Model

Hands-On

Quiz & Assessment

How does election happen - Paxos Algorithm?

Use cases

When not to use

Quiz & Assessment

Why HDFS or Why not existing file systems?

Understanding the architecture

Quiz

Advance HDFS Concepts (HA, Federation)

Quiz

Hands-on with HDFS (Upload, Download, SetRep)

Quiz & Assessment

Data Locality (Rack Awareness)

Computing - Why not existing tools?

MapReduce 1.0

Resource Management: YARN Architecture

Advance Concepts - Speculative Execution

Quiz

Why MapReduce?

Understanding MapReduce Framework

Quiz

Example 0 - Word Frequency Problem - Without MR

Example 1 - Only Mapper - Image Resizing

Example 2 - Word Frequency Problem

Example 3 - Temperature Problem

Example 4 - Multiple Reducer

Example 5 - Java MapReduce Walkthrough

Quiz

Example 6 - Secondary Sorting (Word Recommendation)

Example 7 - Partitioner

Concept - Associative & Commutative

Quiz

Example 8 - Combiner

Example 9 - Hadoop Streaming

Example 10 - Adv. Problem Solving - Anagrams

Example 11 - Adv. Problem Solving - Same DNA

Example 12 - Adv. Problem Solving - Similar DNA

Example 12 - Joins - Voting

Limitations of MapReduce

Quiz

Why Pig?

Basic Structure of Pig Latin

Getting Started

Example - NYSE Stock Exchange

Concept - Lazy Evaluation

Why Hive?

Hive Architecture Overview

Getting Started

Loading Data in Hive (Tables)

Example: Movielens Data Processing

Advance Concepts: Views

Connecting Tableau and HiveServer 2

Connecting Microsoft Excel and HiveServer 2

Project: Sentiment Analyses of Twitter Data

Advanced - Partition Tables

Understanding HCatalog & Impala

Quiz

Case Study: The days before NoSQL

What is NoSQL?

CAP Theorem

HBase Architecture - Region Servers etc

Hbase Data Model - Column Family Orientedness

Getting Started - Create table, Adding Data

Adv Example - Google Links Storage

Concept - Bloom Filter

Comparison of NOSQL Databases

Quiz

Sqoop Overview

Import From MySQL to HDFS, Hive, HBase

Exporting to MySQL from HDFS

Concept - Unbounding Dataset Processing or Stream Processing

Flume Overview: Agents - Source, Sink, Channel

Example 1 - Data from Local network service into HDFS

Example 2 - Extracting Twitter Data

Quiz

Example 3 - Creating workflow with Oozie

Projects

Projects

1. Sentiment analysis of "Iron Man 3" movie using Hive and visualizing the sentiment data using BI tools such as Tableau


2. Process the NSE (National Stock Exchange) data using Hive for various insights


3. Analyze MovieLens data using Hive


Certificate

Certificate

Earn your certificate

Our Specialization is exhaustive and the certificate rewarded by us is proof that you have taken a big leap in Big Data domain.


Differentiate yourself

The knowledge you have gained from working on projects, videos, quizzes, hands-on assessments and case studies gives you a competitive edge.


Share your achievement

Highlight your new skills on your resume, LinkedIn, Facebook and Twitter. Tell your friends and colleagues about it.

 Course Certificate Sample
Course Creators
Sandeep Giri

Sandeep Giri

Founder at CloudxLab
Past: Amazon, InMobi, D.E.Shaw
Course Developer
Abhinav Singh

Abhinav Singh

Co-Founder at CloudxLab
Past: Byjus
Course Developer
 Jatin Shah

Jatin Shah

Ex-LinkedIn, Yahoo, Yale CS Ph.D.
IIT-B
Course Advisor

Reviews

(4.9 out of 5)
...

I have started learning 3 months ago and I really gained much info and practical experience. I completed the “Big Data with Spark” course and the learning journey really exceeded my expectations.

The course structure and topics were great, well organized and comprehensive, even the basics of Linux were covered in a very simple way. There were always exercises and hands-on that build better understanding, also the lab environment and provided online tools were great help and let you practice everything without having to install anything on your PC except the web browser.

In addition, for the live sessions, it was really a joy attending them each weekend, our instructor “Sandeep Giri”, besides his great experience and knowledge, he was generous, helpful and patient answering all attendees questions in such a way that he could go for more examples and hands-on or even searching the documentation and try new things, I gained much from other attendees’ questions and the way Sandeep responded to them.

This was a great experience having this course and I’m going for more courses in Big Data and Machine Learning with CloudxLab and I recommend it for all my friends and colleagues who look for better learning.

...

Must have for practicing and perfecting hadoop. To setup in PC you need to have a very high end configuration and setup will be pseudo node setup.. For better understanding I recomend CloudxLab

...

This course is suitable for everyone. Me being a product manager had not done hands-on coding since quite some time. Python was completely new to me. However, Sandeep Giri gave us a crash course to Python and then introduced us to Machine Learning. Also, the CloudxLab’s environment was very useful to just log in and start practising coding and playing with things learnt. A good mix of theory and practical exercises and specifically the sequence of starting straight away with a project and then going deeper was a very good way of teaching. I would recommend this course to all.

...

They are great. They take care of all the Big Data technologies (Hadoop, Spark, Hive, etc.) so you do not have to worry about installing and running them correclty on your pc. Plus, they have a fantastic customer support. Even when I have had problems debugging my own programs, they have answered me with the correct solution in a few hours, and all of this for a more than reasonable price. I personally recommend it to everyone :)

...

Machine learning courses in especially the Artificial Intelligence for the manager course is excellent in CloudxLab. I have attended some of the course and able to understand as Sandeep Giri sir has taught AI course from scratch and related to our data to day life…

He even takes free sessions to helps students and provides career guidance.

His courses are worthy and even just by watching YouTube video anyone can easily crack the AI interview.

FAQ

In Self-paced learning, you will get,

  • Lifetime access to the self-paced course including videos, assessments, quizzes, and projects
  • Recordings of the previous batch of instructor-led training
  • 24x7 support using the discussion forum
  1. Basics Of SQL. You should know the basics of SQL and databases. If you know about filters in SQL, you are expected to understand the course.

  2. A know-how of the basics of programming. If you understand 'loops' in any programming language, and if you are able to create a directory and see what's inside a file from the command line, you are good to get the concepts of this course even if you have not really touched programming for the last 10 years! In addition, we will be providing video classes on the basics of Python and Scala.

The instructors for this course are industry experts having years of experience in mentoring students across the world.

It will take 2-3 months with 6-8 hours of effort per week.

We understand that you might need course material for a longer duration to make most out of your subscription. You will get lifetime access (Till the company is operational) to the course material so that you can refer to the course material anytime.

In this course, you will work on real-time projects. You will receive a problem statement along with a data-set to work on CloudxLab. Once you are done with the projects (it will be reviewed by an expert), you will be awarded a certificate which you can share on LinkedIn.

You can check https://youtu.be/dXCx4anEcgU for watching the Course Preview.

Yes. Java is generally required for understanding MapReduce. MapReduce is a programming paradigm for writing your logic in the form of Mapper and reducer functions. We provide a self-paced course on Java for free. As soon as you signup, it would be available in your account section.

Course requires a good internet (1 Mbps or more) and a browser to watch videos and do hands-on the lab. We've configured all the tools in the lab so that you can focus on learning and practicing in a real-world cluster.

For self-paced course, we provide 100% fees refund if the request is raised within 7 days from enrollment date. Thereafter, no refund is provided.

For instructor-led course, we provide 100% refund if not more than 1 live session has been conducted -- and we provide 50% refund if 2-4 live sessions have been conducted. If 5 or more live sessions have been conducted, then no refund will be provided.
Yes, you can renew your subscription anytime. Please choose your desired plan for the lab and make payment to renew your subscription.
Related Courses
Spark Developer
Big Data with Spark
Big Data with Hadoop and Spark
Big Data with Hadoop and Spark

Have more questions? Please contact us at reachus@cloudxlab.com