PG Certificate Course in Data Science, AI & Machine Learning by IIT Roorkee. Apply Now & Get up to Rs. 75,000 OFF! Offer Ends in:

    Apply Now











About the Course

This Data Science Certification Program is a self-paced online course. This gives you complete freedom about your schedule and convenience.

This course has over 220+ hours of learning. This consists of 5 courses (Big Data with Hadoop, Big Data with Spark, Python, Machine Learning, and Deep Learning).

Additionally, this course comes with our exclusive lab access to gain the much needed hands-on experience to solve the real-world problems.

Upon successfully completing the course, you will get the certificate from CloudxLab which you can use for progressing in your career and finding better opportunities.

Program Highlights

  • Certificate of Completion by CloudxLab

  • 220+ Hours of Self-Paced Learning

  • Work on about 26+ projects to get hands-on experience

  • Timely Doubt Resolution

  • Best In Class Curriculum

  • Cloud Lab Access

Start your Career as a Data Scientist


What is the certificate like?

  • Why Cloudxlab?

    CloudxLab is a team of developers, engineers, and educators passionate about building innovative products to make learning fun, engaging, and for life. We are a highly motivated team who build fresh and lasting learning experiences for our users. Powered by our innovation processes, we provide a gamified environment where learning is fun and constructive. From creative design to intuitive apps we create a seamless learning experience for our users. We upskill engineers in deep tech - make them employable & future-ready.

Hands-on Learning

hands-on lab
  • Gamified Learning Platform

  • Auto-assessment Tests

  • No Installation Required


Mentors / Faculty

Instructor Sandeep Giri

Sandeep Giri

Founder at CloudxLab

Past: Amazon, InMobi, D.E.Shaw

Instructor Abhinav Singh

Abhinav Singh

Co-Founder at CloudxLab

Past: Byjus

Instructor Jatin

Jatin Shah

Yale CS, Ph.D. IIT-Bombay

Past: Ex-LinkedIn, Yahoo


Hours of Self-Paced Learning
Days of Lab Access

Python for Machine Learning

1. Programming Tools and Foundational Concepts
1. Introduction to Linux
2. Introduction to Python
3. Hands-on using Jupyter on CloudxLab
4. Overview of Linear Algebra
5. Introduction to NumPy and Pandas

Course on Big Data with Hadoop

1. Introduction
1. Big Data Introduction
2. Distributed systems
3. Big Data Use Cases
4. Various Solutions
5. Overview of Hadoop Ecosystem
6. Spark Ecosystem Walkthrough
7. Quiz
2. Foundation & Environment
1. Understanding the CloudxLab
2. Getting Started - Hands on
3. Hadoop & Spark Hands-on
4. Quiz and Assessment
5. Basics of Linux - Quick Hands-On
6. Understanding Regular Expressions
7. Quiz and Assessment
8. Setting up VM (optional)
3. Zookeeper
1. ZooKeeper - Race Condition
2. ZooKeeper - Deadlock
3. Hands-On
4. Quiz & Assessment
5. How does election happen - Paxos Algorithm?
6. Use cases
7. When not to use
8. Quiz & Assessment
1. Why HDFS or Why not existing file systems?
2. HDFS - NameNode & DataNodes
3. Quiz
4. Advance HDFS Concepts (HA, Federation)
5. Quiz
6. Hands-on with HDFS (Upload, Download, SetRep)
7. Quiz & Assessment
8. Data Locality (Rack Awareness)
1. YARN - Why not existing tools?
2. YARN - Evolution from MapReduce 1.0
3. Resource Management: YARN Architecture
4. Advance Concepts - Speculative Execution
5. Quiz
6. MapReduce Basics
1. MapReduce - Understanding Sorting
2. MapReduce - Overview
3. Quiz
4. Example 0 - Word Frequency Problem - Without MR
5. Example 1 - Only Mapper - Image Resizing
6. Example 2 - Word Frequency Problem
7. Example 3 - Temperature Problem
8. Example 4 - Multiple Reducer
9. Example 5 - Java MapReduce Walkthrough
10. Quiz
7. MapReduce Advanced
1. Writing MapReduce Code Using Java
2. Building MapReduce project using Apache Ant
3. Concept - Associative & Commutative
4. Quiz
5. Example 8 - Combiner
6. Example 9 - Hadoop Streaming
7. Example 10 - Adv. Problem Solving - Anagrams
8. Example 11 - Adv. Problem Solving - Same DNA
9. Example 12 - Adv. Problem Solving - Similar DNA
10. Example 13 - Joins - Voting
11. Limitations of MapReduce
12. Quiz
8. Analyzing Data with Pig
1. Pig - Introduction
2. Pig - Modes
3. Getting Started
4. Example - NYSE Stock Exchange
5. Concept - Lazy Evaluation
9. Processing Data with Hive
1. Hive - Introduction
2. Data Types
3. Getting Started
4. Loading Data in Hive (Tables)
5. Example: Movielens Data Processing
6. Advance Concepts: Views
7. Connecting Tableau and HiveServer 2
8. Connecting Microsoft Excel and HiveServer 2
9. Project: Sentiment Analyses of Twitter Data
10. Advanced - Partition Tables
11. Understanding HCatalog & Impala
12. Quiz
10. NoSQL and HBase
1. NoSQL - Scaling Out / Up
2. NoSQL - ACID Properties and RDBMS Story
3. CAP Theorem
4. HBase Architecture - Region Servers etc
5. Hbase Data Model - Column Family Orientedness
6. Getting Started - Create table, Adding Data
7. Adv Example - Google Links Storage
8. Concept - Bloom Filter
9. Comparison of NOSQL Databases
10. Quiz
11. Importing Data with Sqoop and Flume, Oozie
1. Sqoop - Introduction
2. Sqoop Import - MySQL to HDFS
3. Exporting to MySQL from HDFS
4. Concept - Unbounding Dataset Processing or Stream Processing
5. Flume Overview: Agents - Source, Sink, Channel
6. Example 1 - Data from Local network service into HDFS
7. Example 2 - Extracting Twitter Data
8. Quiz
9. Example 3 - Creating workflow with Oozie

Course on Big Data with Spark

1. Introduction
1.1 Apache Spark ecosystem walkthrough
1.2 Spark Introduction - Why Spark?
1.3 Quiz
2. Scala Basics
2.1 Scala - Quick Introduction - Access Scala on CloudxLab
2.2 Scala - Quick Introduction - Variables and Methods
2.3 Getting Started: Interactive, Compilation, SBT
2.4 Types, Variables & Values
2.5 Functions
2.6 Collections
2.7 Classes
2.8 Parameters
2.9 More Features
2.10 Quiz and Assessment
3. Spark Basics
3.1 Apache Spark ecosystem walkthrough
3.2 Spark Introduction - Why Spark?
3.3 Using the Spark Shell on CloudxLab
3.4 Example 1 - Performing Word Count
3.5 Understanding Spark Cluster Modes on YARN
3.6 RDDs (Resilient Distributed Datasets)
3.7 General RDD Operations: Transformations & Actions
3.8 RDD lineage
3.9 RDD Persistence Overview
3.10 Distributed Persistence
4. Writing and Deploying Spark Applications
4.1 Creating the SparkContext
4.2 Building a Spark Application (Scala, Java, Python)
4.3 The Spark Application Web UI
4.4 Configuring Spark Properties
4.5 Running Spark on Cluster
4.6 RDD Partitions
4.7 Executing Parallel Operations
4.8 Stages and Tasks
5. Common Patterns in Spark Data Processing
5.1 Common Spark Use Cases
5.2 Example 1 - Data Cleaning (Movielens)
5.3 Example 2 - Understanding Spark Streaming
5.4 Understanding Kafka
5.5 Example 3 - Spark Streaming from Kafka
5.6 Iterative Algorithms in Spark
5.7 Project: Real-time analytics of orders in an e-commerce company
6. Data Formats & Management
6.1 InputFormat and InputSplit
6.2 JSON
6.3 XML
6.4 AVRO
6.5 How to store many small files - SequenceFile?
6.6 Parquet
6.7 Protocol Buffers
6.8 Comparing Compressions
6.9 Understanding Row Oriented and Column Oriented Formats - RCFile?
7. DataFrames and Spark SQL
7.1 Spark SQL - Introduction
7.2 Spark SQL - Dataframe Introduction
7.3 Transforming and Querying DataFrames
7.4 Saving DataFrames
7.5 DataFrames and RDDs
7.6 Comparing Spark SQL, Impala, and Hive-on-Spark
8. Machine Learning with Spark
8.1 Machine Learning Introduction
8.2 Applications Of Machine Learning
8.3 MlLib Example: k-means
8.4 SparkR Example

Course on Machine Learning

1. Introduction to Statistics
1. Statistical Inference
2. Probability Distribution
3. Normality
4. Measures of Central Tendencies
5. Normal Distribution
2. Machine Learning Applications and Landscape
1. Introduction to Machine Learning
2. Machine Learning Application
3. Introduction to AI
4. Different types of Machine Learning - Supervised, Unsupervised
3. Building End-To-End Machine Learning Project
1. Machine Learning Projects Checklist
2. Get the data
3. Launch, monitor, and maintain the system
4. Explore the data to gain insights
5. Prepare the data for Machine Learning algorithms
6. Explore many different models and short-list the best ones
7. Fine-tune model
4. Classifications
1. Training a Binary classification
2. Multiclass,Multilabel and Multioutput Classification
3. Performance Measures
4. Confusion Matrix
5. Precision and Recall
6. Precision/Recall Tradeoff
7. The ROC Curve
5. Training Models
1. Linear Regression
2. Gradient Descent
3. Polynomial Regression
4. Learning Curves
5. Regularized Linear Models
6. Logistic Regression
6. Support Vector Machines
1. Linear SVM Classification
2. Nonlinear SVM Classification
3. SVM Regression
7. Decision Trees
1. Training and Visualizing a Decision Tree
2. Making Predictions
3. Estimating Class Probabilities
4. The CART Training Algorithm
5. Gini Impurity or Entropy
6. Regularization Hyperparameters
7. Instability
8. Ensemble Learning and Random Forests
1. Voting Classifiers
2. Bagging and Pasting
3. Random Patches and Random Subspaces
4. Random Forests
5. Boosting and Stacking
9. Dimensionality Reduction
1. The Curse of Dimensionality
2. Main Approaches for Dimensionality Reduction
3. PCA
4. Kernel PCA
5. LLE
6. Other Dimensionality Reduction Techniques

Course on Deep Learning

1. Introduction to Artificial Neural Networks
1.1 From Biological to Artificial Neurons
1.2 Implementing MLPs using Keras with TensorFlow Backend
1.3 Fine-Tuning Neural Network Hyperparameters
2. Training Deep Neural Networks
2.1 The Vanishing / Exploding Gradients Problems
2.2 Reusing Pretrained Layers
2.3 Faster Optimizers
2.4 Avoiding Overfitting Through Regularization
2.5 Practical Guidelines to Train Deep Neural Networks
3. Custom Models and Training with Tensorflow
3.1 A Quick Tour of TensorFlow
3.2 Customizing Models and Training Algorithms
3.3 Tensorflow Functions and Graphs
4. Loading and Preprocessing Data with Tensorflow
4.1 Introduction to the Data API
4.2 TFRecord Format
4.3 Preprocessing the Input Features
4.4 TF Transform
4.5 The TensorFlow Datasets (TDFS) Projects
5. Convolutional Neural Networks
5.1 The Architecture of the Visual Cortex
5.2 Convolutional Layer
5.3 Pooling Layer
5.4 CNN Architectures
5.5 Classification with Keras
5.6 Transfer Learning with Keras
5.7 Object Detection
5.8 YOLO
6. Recurrent Neural Networks
6.1 Recurrent Neurons and Layers
6.2 Basic RNNs in TensorFlow
6.3 Training RNNs
6.4 Deep RNNs
6.5 Forecasting a Time Series
6.6 LSTM Cell
6.7 GRU Cell
7. Natural Language Processing
7.1 Introduction to Natural Language Processing
7.2 Creating a Quiz Using TextBlob
7.3 Finding Related Posts with scikit-learn
7.4 Generating Shakespearean Text Using Character RNN
7.5 Sentiment Analysis
7.6 Encoder-Decoder Network for Neural Machine Translation
7.7 Attention Mechanisms
7.8 Recent Innovations in Language Models
8. Autoencoders and GANs
8.1 Efficient Data Representations
8.2 Performing PCA with an Under Complete Linear Autoencoder
8.3 Stacked Autoencoders
8.4 Unsupervised Pre Training Using Stacked Autoencoders
8.5 Denoising Autoencoders
8.6 Sparse Autoencoders
8.7 Variational Autoencoders
8.8 Generative Adversarial Networks
9. Reinforcement Learning
9.1 Learning to Optimize Rewards
9.2 Policy Search
9.3 Introduction to OpenAI Gym
9.4 Neural Network Policies
9.5 Evaluating Actions: The Credit Assignment Problem
9.6 Policy Gradients
9.7 Markov Decision Processes
9.8 Temporal Difference Learning and Q-Learning
9.9 Deep Q-Learning Variants
9.10 The TF-Agents Library


Enroll Now


This course is for engineers, product managers, and anyone who wants to learn. We will cover foundations of linear algebra, calculus and statistical inference where ever required so that you can learn the concepts effectively. There is no prerequisite or programming knowledge required.

Subscription | CloudxLab

Start Learning



Data Science Specialization
(lifetime course access)
90 days Cloud Lab



Data Science Specialization
(lifetime course access)
180 days Cloud Lab



Data Science Specialization
(lifetime course access)
365 days Cloud Lab


Subscribe to CloudxLab Premium

34 /mo

17 /mo

  • 180 days Cloud Lab access
  • 6 months access to all CloudxLab self paced courses
  • Earn Industry-relevant Certificates
  • Placement Assistance
  • Cancel Anytime
Explore cloudxlab Pro

Get Access to ALL Courses with One Single Subscription.


Frequently Asked Questions

What are the prerequisites and requirements for this course?

This course is for engineers, product managers, and anyone who wants to learn. We will cover foundations of linear algebra, calculus and statistical inference where ever required so that you can learn the concepts effectively. There is no prerequisite or programming knowledge required.

What is the refund policy?

If you are unhappy with the product for any reason, let us know within 7 days of purchasing or upgrading your account, and we'll cancel your account and issue a full refund. Please contact us at to request a refund within the stipulated time. We will be sorry to see you go though!

Do I need to install any software before starting this course?

No, we will provide you with access to our online lab and BootML so that you do not have to install anything on your local machine

What is the validity of the course material?

We understand that you might need course material for a longer duration to make most out of your subscription. You will get lifetime access to the course material so that you can refer to the course material anytime.

Do we have to pay separately for the lab?

No, the lab is available within the course price.

How to view the course after getting access to it?

Please log in at with your Gmail Id and access your course under "My Courses".

What do I need to fulfill to get the CloudxLab certificate for the course?

You should complete 100% of the course along with all the given projects in order to be eligible for the certificate.

Kindly note that there is no deadline for CloudxLab courses.

Can I get a certificate for the projects completed?

We have created a set of Guided Projects on our platform. You may complete these guided projects and earn the certificate for free. Check it out here