Freedom Sale | 15% off on all CloudxLab Pro Subscriptions! | COUPON CODE - FREEDOM15| Valid Till

  Enroll Now

Data Science Certification Program by CloudxLab

Learn Python, NumPy, Pandas, Scikit-learn, HDFS, ZooKeeper, Hive, HBase, NoSQL, Oozie, Flume, Sqoop, Spark, Spark RDD, Spark Streaming, Kafka, SparkR, SparkSQL, MLlib, Regression, Clustering, Classification, SVM, Random Forests, Decision Trees, Dimensionality Reduction, TensorFlow 2, Keras, Convolutional & Recurrent Neural Networks, Autoencoders, Reinforcement Learning and More

4,168 Ratings        13,500+ learners

  220+ Hours of Online Self-Paced Training

  270 Days of Lab

  Timely Doubt Resolution

  26+ Projects

About the Course

This Data Science Certification Program is a self-paced online course. This gives you complete freedom about your schedule and convenience.

This course has over 220+ hours of learning. This consists of 5 courses (Big Data with Hadoop, Big Data with Spark, Python, Machine Learning, and Deep Learning).

Additionally, this course comes with our exclusive lab access to gain the much needed hands-on experience to solve the real-world problems.

Upon successfully completing the course, you will get the certificate from CloudxLab which you can use for progressing in your career and finding better opportunities.

5 Courses

Big Data with Hadoop, Big Data with Spark, Python, Machine Learning, Deep Learning

Cloud Lab

Apply the skills you learn on a distributed cluster to solve real-world problems


Work on about 26+ projects to get hands-on experience

Best-in-class Support

Timely doubt resolution through the discussion forum with the help of international community of peers.


Highlight your new skills on your resume or LinkedIn.
Subscribe Now
Refer your friends and get 30 days free lab access Invite Friends »

Free Trial
  • Access to all CloudxLab self-paced courses
  • Real-time cluster access for 3 days
  • No access to third-party courses and instructor-led trainings
Subscribe Now
Subscription for 1 month
  • Unlimited Access to all CloudxLab self-paced courses
  • Real-time cluster access
  • Earn Industry-relevant Certificates
  • No access to third-party courses and instructor-led trainings
  • Access to Job Portal
Subscribe Now
Subscription for 6 months
  • Unlimited Access to all CloudxLab self-paced courses
  • Real-time cluster access
  • Earn Industry-relevant Certificates
  • No access to third-party courses and instructor-led trainings
  • Access to Job Portal
Subscribe Now

Note: In case of a coupon code, discounts will be applicable only on the first EMI

Get a callback from a Course Counselor - Click Here
Learning Path

Course 1

Python for Machine Learning

1.1 Introduction to Linux
1.2 Introduction to Python
1.3 Hands-on using Jupyter on CloudxLab
1.4 Overview of Linear Algebra
1.5 Introduction to NumPy & Pandas

Course 2

Big Data with Hadoop

1.1 Big Data Introduction
1.2 Distributed systems
1.3 Big Data Use Cases
1.4 Various Solutions
1.5 Overview of Hadoop Ecosystem
1.6 Spark Ecosystem Walkthrough
1.7 Quiz
2.1 Understanding the CloudxLab
2.2 Getting Started - Hands on
2.3 Hadoop & Spark Hands-on
2.4 Quiz and Assessment
2.5 Basics of Linux - Quick Hands-On
2.6 Understanding Regular Expressions
2.7 Quiz and Assessment
2.8 Setting up VM (optional)
3.1 ZooKeeper - Race Condition
3.2 ZooKeeper - Deadlock
3.3 Hands-On
3.4 Quiz & Assessment
3.5 How does election happen - Paxos Algorithm?
3.6 Use cases
3.7 When not to use
3.8 Quiz & Assessment
4.1 Why HDFS or Why not existing file systems?
4.2 HDFS - NameNode & DataNodes
4.3 Quiz
4.4 Advance HDFS Concepts (HA, Federation)
4.5 Quiz
4.6 Hands-on with HDFS (Upload, Download, SetRep)
4.7 Quiz & Assessment
4.8 Data Locality (Rack Awareness)
5.1 YARN - Why not existing tools?
5.2 YARN - Evolution from MapReduce 1.0
5.3 Resource Management: YARN Architecture
5.4 Advance Concepts - Speculative Execution
5.5 Quiz
6.1 MapReduce - Understanding Sorting
6.2 MapReduce - Overview
6.3 Quiz
6.4 Example 0 - Word Frequency Problem - Without MR
6.5 Example 1 - Only Mapper - Image Resizing
6.6 Example 2 - Word Frequency Problem
6.7 Example 3 - Temperature Problem
6.8 Example 4 - Multiple Reducer
6.9 Example 5 - Java MapReduce Walkthrough
6.10 Quiz
7.1 Writing MapReduce Code Using Java
7.2 Building MapReduce project using Apache Ant
7.3 Concept - Associative & Commutative
7.4 Quiz
7.5 Example 8 - Combiner
7.6 Example 9 - Hadoop Streaming
7.7 Example 10 - Adv. Problem Solving - Anagrams
7.8 Example 11 - Adv. Problem Solving - Same DNA
7.9 Example 12 - Adv. Problem Solving - Similar DNA
7.10 Example 12 - Joins - Voting
7.11 Limitations of MapReduce
7.12 Quiz
8.1 Pig - Introduction
8.2 Pig - Modes
8.3 Getting Started
8.4 Example - NYSE Stock Exchange
8.5 Concept - Lazy Evaluation
9.1 Hive - Introduction
9.2 Hive - Data Types
9.3 Getting Started
9.4 Loading Data in Hive (Tables)
9.5 Example: Movielens Data Processing
9.6 Advance Concepts: Views
9.7 Connecting Tableau and HiveServer 2
9.8 Connecting Microsoft Excel and HiveServer 2
9.9 Project: Sentiment Analyses of Twitter Data
9.10 Advanced - Partition Tables
9.11 Understanding HCatalog & Impala
9.12 Quiz
10.1 NoSQL - Scaling Out / Up
10.2 NoSQL - ACID Properties and RDBMS Story
10.3 CAP Theorem
10.4 HBase Architecture - Region Servers etc
10.5 Hbase Data Model - Column Family Orientedness
10.6 Getting Started - Create table, Adding Data
10.7 Adv Example - Google Links Storage
10.8 Concept - Bloom Filter
10.9 Comparison of NOSQL Databases
10.10 Quiz
11.1 Sqoop - Introduction
11.2 Sqoop Import - MySQL to HDFS
11.3 Exporting to MySQL from HDFS
11.4 Concept - Unbounding Dataset Processing or Stream Processing
11.5 Flume Overview: Agents - Source, Sink, Channel
11.6 Example 1 - Data from Local network service into HDFS
11.7 Example 2 - Extracting Twitter Data
11.8 Quiz
11.9 Example 3 - Creating workflow with Oozie

Course 3

Big Data with Spark

1.1 Apache Spark ecosystem walkthrough
1.2 Spark Introduction - Why Spark?
1.3 Quiz
2.1 Scala - Quick Introduction - Access Scala on CloudxLab
2.2 Scala - Quick Introduction - Variables and Methods
2.3 Getting Started: Interactive, Compilation, SBT
2.4 Types, Variables & Values
2.5 Functions
2.6 Collections
2.7 Classes
2.8 Parameters
2.9 More Features
2.10 Quiz and Assessment
3.1 Apache Spark ecosystem walkthrough
3.2 Spark Introduction - Why Spark?
3.3 Using the Spark Shell on CloudxLab
3.4 Example 1 - Performing Word Count
3.5 Understanding Spark Cluster Modes on YARN
3.6 RDDs (Resilient Distributed Datasets)
3.7 General RDD Operations: Transformations & Actions
3.8 RDD lineage
3.9 RDD Persistence Overview
3.10 Distributed Persistence
4.1 Creating the SparkContext
4.2 Building a Spark Application (Scala, Java, Python)
4.3 The Spark Application Web UI
4.4 Configuring Spark Properties
4.5 Running Spark on Cluster
4.6 RDD Partitions
4.7 Executing Parallel Operations
4.8 Stages and Tasks
5.1 Common Spark Use Cases
5.2 Example 1 - Data Cleaning (Movielens)
5.3 Example 2 - Understanding Spark Streaming
5.4 Understanding Kafka
5.5 Example 3 - Spark Streaming from Kafka
5.6 Iterative Algorithms in Spark
5.7 Project: Real-time analytics of orders in an e-commerce company
6.1 InputFormat and InputSplit
6.2 JSON
6.3 XML
6.4 AVRO
6.5 How to store many small files - SequenceFile?
6.6 Parquet
6.7 Protocol Buffers
6.8 Comparing Compressions
6.9 Understanding Row Oriented and Column Oriented Formats - RCFile?
7.1 Spark SQL - Introduction
7.2 Spark SQL - Dataframe Introduction
7.3 Transforming and Querying DataFrames
7.4 Saving DataFrames
7.5 DataFrames and RDDs
7.6 Comparing Spark SQL, Impala, and Hive-on-Spark
8.1 Machine Learning Introduction
8.2 Applications Of Machine Learning
8.3 MlLib Example: k-means
8.4 SparkR Example

Course 4

Machine Learning

1. Statistical Inference
2. Probability Distribution
3. Normality
4. Measures of Central Tendencies
5. Normal Distribution
1. Introduction to Machine Learning
2. Machine Learning Application
3. Introduction to AI
4. Different types of Machine Learning - Supervised, Unsupervised
1. Machine Learning Projects Checklist
2. Get the data
3. Launch, monitor, and maintain the system
4. Explore the data to gain insights
5. Prepare the data for Machine Learning algorithms
6. Explore many different models and short-list the best ones
7. Fine-tune model
1. Training a Binary classification
2. Multiclass,Multilabel and Multioutput Classification
3. Performance Measures
4. Confusion Matrix
5. Precision and Recall
6. Precision/Recall Tradeoff
7. The ROC Curve
1. Linear Regression
2. Gradient Descent
3. Polynomial Regression
4. Learning Curves
5. Regularized Linear Models
5. Logistic Regression
1. Linear SVM Classification
2. Nonlinear SVM Classification
3. SVM Regression
1. Training and Visualizing a Decision Tree
2. Making Predictions
3. Estimating Class Probabilities
4. The CART Training Algorithm
5. Gini Impurity or Entropy
6. Regularization Hyperparameters
7. Instability
1. Voting Classifiers
2. Bagging and Pasting
3. Random Patches and Random Subspaces
4. Random Forests
5. Boosting and Stacking
1. The Curse of Dimensionality
2. Main Approaches for Dimensionality Reduction
3. PCA
4. Kernel PCA
5. LLE
6. Other Dimensionality Reduction Techniques

Course 5

Deep Learning

1. From Biological to Artificial Neurons
2. Implementing MLPs using Keras with TensorFlow Backend
3. Fine-Tuning Neural Network Hyperparameters
1. The Vanishing / Exploding Gradients Problems
2. Reusing Pretrained Layers
3. Faster Optimizers
4. Avoiding Overfitting Through Regularization
5. Practical Guidelines to Train Deep Neural Networks
1. A Quick Tour of TensorFlow
2. Customizing Models and Training Algorithms
3. Tensorflow Functions and Graphs
1. Introduction to the Data API
2. TFRecord Format
3. Preprocessing the Input Features
4. TF Transform
5. The TensorFlow Datasets (TDFS) Projects
1. The Architecture of the Visual Cortex
2. Convolutional Layer
3. Pooling Layer
4. CNN Architectures
5. Classification with Keras
6. Transfer Learning with Keras
7. Object Detection
1. Recurrent Neurons and Layers
2. Basic RNNs in TensorFlow
3. Training RNNs
4. Deep RNNs
5. Forecasting a Time Series
6. LSTM Cell
7. GRU Cell
1. Introduction to Natural Language Processing
2. Creating a Quiz Using TextBlob
3. Finding Related Posts with scikit-learn
4. Generating Shakespearean Text Using Character RNN
5. Sentiment Analysis
6. Encoder-Decoder Network for Neural Machine Translation
7. Attention Mechanisms
8. Recent Innovations in Language Models
1. Efficient Data Representations
2. Performing PCA with an Under Complete Linear Autoencoder
3. Stacked Autoencoders
4. Unsupervised Pre Training Using Stacked Autoencoders
5. Denoising Autoencoders
6. Sparse Autoencoders
7. Variational Autoencoders
8. Generative Adversarial Networks
1. Learning to Optimize Rewards
2. Policy Search
3. Introduction to OpenAI Gym
4. Neural Network Policies
5. Evaluating Actions: The Credit Assignment Problem
6. Policy Gradients
7. Markov Decision Processes
8. Temporal Difference Learning and Q-Learning
9. Deep Q-Learning Variants
10. The TF-Agents Library


1. Process the NYSE

Process the NYSE (New York Stock Exchange) data using Hive for various insights.

2. Sentiment analysis

Sentiment analysis of "Iron Man 3" movie using Hive and visualizing the sentiment data using BI tools such as Tableau

3. MovieLens Project

Analyze MovieLens data using Hive

4. Spark MLlib

Generate movie recommendations using Spark MLlib

5. Spark GraphX

Derive the importance of various handles at Twitter using Spark GraphX

6. Churn the logs

Churn the logs of NASA Kennedy Space Center WWW server using Spark to find out useful business and devops metrics

7. Spark application

Write end-to-end Spark application starting from writing code on your local machine to deploying to the cluster

8.Analytics Dashboard

Real-time analytics dashboard for an e-commerce company using Apache Spark, Kafka, Spark Streaming, Node.js, Socket.IO and Highcharts

9. Analyze Emails

Churn the mail activity from various individuals in an open source project development team.

10. Predict bikes rental demand

Build a model to predict the bikes demand given the past data.

11. Noise removal from the images

Build a model that takes a noisy image as an input and outputs the clean image.

12. Predict which passengers survived in the Titanic shipwreck

The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. In this project, you build a model to predict which passengers survived the tragedy.

13. Build a spam classifier

Build a model to classify email as spam or ham. First, download examples of spam and ham from Apache SpamAssassin’s public datasets and then train a model to classify email.

14. Build an Image Classifier in Fashion MNIST dataset

Classify images from the Fashion MNIST dataset using scikit-learn, and Python.

15. Deploy Machine Learning models to Production using Flask

Learn how to deploy a machine learning model as a web application using the Flask framework.

16. Build an Image Classifier in Fashion MNIST dataset

Classify images from the Fashion MNIST dataset using Tensorflow 2, Matplotlib, and Python.

17. Training from Scratch vs Transfer Learning

Learn how to train a neural network from scratch to classify data using TensorFlow 2, and how to use the weights of an already trained model to achieve classification to another set of data.

18. Working with Custom Loss Function

Create a custom loss function in Keras with TensorFlow 2 backend.

19. Image Classification with Pre-trained Keras models

Learn how to access the pre-trained models(here we get pre-trained ResNet model) from Keras of TensorFlow 2 to classify images.

20. Build cats classifier using transfer learning

In this project, you will build a basic neural network to classify if a given image is of cat or not using transfer learning technique with Python and Keras.

21. Mask R-CNN with OpenCV for Object Detection

Learn how to read a pre-trained TensorFlow model for object detection using OpenCV.

22. Art Generation Project

Use TensorFlow 2 to generate an image that is an artistic blend of a content image and style image using Neural Style Transfer.

23. NYSE Stock Closing Price Prediction using TensorFlow 2 & Keras

Predict stock market closing prices for a firm using GRU, a state-of-art deep learning algorithm for sequential data, with Keras and Python.

24. Sentiment Analysis using IMDB dataset

Create a sentiment analysis model with the IMDB dataset using TensorFlow 2.

25. Autoencoders for Fashion MNIST

Learn how to practically implement the autoencoder, stacking an encoder and decoder using TensorFlow 2, and depict reconstructed output images by the autoencoder model using the Fashion MNIST dataset.

26. Deploy Image Classification Pre-trained Keras model using Flask

Learn how to deploy a deep learning model as a web application using the Flask framework.



Earn your certificate

Our course is exhaustive and the certificate rewarded by us is proof that you have taken a big leap in Python.

Differentiate yourself

The knowledge you have gained from working on projects, videos, quizzes, hands-on assessments and case studies gives you a competitive edge.

Share your achievement

Highlight your skills on your resume, LinkedIn, Facebook and Twitter. Tell your friends and colleagues about it.

 Course Certificate Sample
Course Creators
Sandeep Giri

Sandeep Giri

Founder at CloudxLab
Past: Amazon, InMobi, D.E.Shaw
Course Developer
Abhinav Singh

Abhinav Singh

Co-Founder at CloudxLab
Past: Byjus
Course Developer
 Jatin Shah

Jatin Shah

Ex-LinkedIn, Yahoo, Yale CS Ph.D.
Course Advisor


(4.7 out of 5)

This course is suitable for everyone. Me being a product manager had not done hands-on coding since quite some time. Python was completely new to me. However, Sandeep Giri gave us a crash course to Python and then introduced us to Machine Learning. Also, the CloudxLab’s environment was very useful to just log in and start practising coding and playing with things learnt. A good mix of theory and practical exercises and specifically the sequence of starting straight away with a project and then going deeper was a very good way of teaching. I would recommend this course to all.


Machine learning courses in especially the Artificial Intelligence for the manager course is excellent in CloudxLab. I have attended some of the course and able to understand as Sandeep Giri sir has taught AI course from scratch and related to our data to day life…

He even takes free sessions to helps students and provides career guidance.

His courses are worthy and even just by watching YouTube video anyone can easily crack the AI interview.


This is one of the best-designed course, very informative and well paced. The killer feature of machine/deep learning coursed from CloudxLab is the live session with access to labs for hands-on practices! With that, it becomes easy following any discourse, even if one misses the live sessions(Read that as me!). Sandeep(course instructor) has loads of patience and his way of explaining things are just remarkable. I might have better comments to add here, once I learn more! Great Jobs guys!


It has been a wonderful learning experience with CXL. This is one of the courses that will probably stay with me for a significant amount of time. The platform provides a unique opportunity to try hands-on simultaneously with the coursework in an almost real-life coding example. Besides, learning to use algebra, tech system and Git is a good refresher for anyone planning to start or stay in technology. The course covers the depth and breadth of ML topics. I specifically like the MNIST example and the depth to which it goes in explaining each and every line of code. Would definitely recommend the instructor-led course.


I took both the machine learning and deep learning course at CloudXLab. I came into the first part of the course with some knowledge of machine learning but the class really helped me understand some of the topics a lot clearer. I think the best part of the class is the instructor Sandeep. He is very knowledgeable and does a really good job explaining topics that can be nebulous at times. Another favorite part of the course are the online labs. I would watch the 3hr lecture the next day, and then work on the labs. The labs reinforces the lectures with questions and coding assignments. There is also a message board and a slack channel. I preferred using slack, but I think you get a quicker response if you use the message board. As far as the deep learning portion of the course, it was all new to me but I was building CNN and RNN models using TensorFlow after each 3hr lecture. Overall, I was very pleased with the course. I am hoping that CloudxLab will put together an advanced class focusing more on deploying models to the clouds, working with pipelines, DevOps etc…


I found the ML&DL course very well structured with ample examples and hands on exercises. Sandeep was very patient in answering questions and he made the training sessions very interactive. I would recommend this training to all who plan to take a dive into the world of machine and deep learning.


I have thoroughly enjoyed both the ML and DL courses from CloudXLab and will look forward to reviewing the videos/material at a later time. I’ve been to many meetups and paid sessions on ML /DL and this course beats most of them on the depth of topics and certainly breadth of topics. I’ve not taken any online courses (Andrew Ng, for example) to their conclusion, so I won’t draw a conclusion there. For an instructor-led, interactive course, I would expect to pay many times more for a class (ML and DL) such as this in the US. The instructor is easy to understand, has extensive experience, and truly cares about the student knowing the material.


A very well structured instructor-led course. The instructor was very thorough, and always willing to answer questions and clarify coursework, no matter how minor. The course described the theory of machine/deep learning well, but also followed through with very thorough examples to demonstrate the practical implementations of the theory. This leads nicely into the student exercises, which served to solidify the instructor's teachings and encourage experimentation. The resources provided for students was exceptional and presented in a very user-friendly format.

My only complaint is that the course went quite overtime, but I also appreciate Sandeeps dedication to quality and ensuring that he finished teaching us everything adequately.


I have been using CloudxLab for Machine Learning and based on experience I can say that they have done a fabulous job in training and certification process which makes the user so interactive with faculty and software intuitive.


This course is for engineers, product managers, and anyone who wants to learn. We will cover foundations of linear algebra, calculus and statistical inference where ever required so that you can learn the concepts effectively. There is no prerequisite or programming knowledge required.

If you are unhappy with the product for any reason, let us know within 7 days of purchasing or upgrading your account, and we'll cancel your account and issue a full refund. Please contact us at to request a refund within the stipulated time. We will be sorry to see you go though!

No, we will provide you with access to our online lab and BootML so that you do not have to install anything on your local machine

We understand that you might need course material for a longer duration to make most out of your subscription. You will get lifetime access to the course material so that you can refer to the course material anytime.

No, the lab is available within the course price.

Please log in at with your Gmail Id and access your course under "My Courses".

You should complete 100% of the course along with all the given projects in order to be eligible for the certificate.

Kindly note that there is no deadline for CloudxLab courses.

We have created a set of Guided Projects on our platform. You may complete these guided projects and earn the certificate for free. Check it out here

Have more questions? Please contact us at