Double Delight Sale: Flat 70% + Addl. 30% Off + 30-days Extra Lab on all Courses | Use Coupon DS30 in Checkout | Offer Expires In

  Enroll Now

E&ICT Academy, IIT Roorkee

An initiative of Ministry of Electronics and Information Technology (MeitY) Govt. of India

Data Science Certification Program by E&ICT, IIT Roorkee

Learn Python, NumPy, Pandas, Scikit-learn, HDFS, ZooKeeper, Hive, HBase, NoSQL, Oozie, Flume, Sqoop, Spark, Spark RDD, Spark Streaming, Kafka, SparkR, SparkSQL, MLlib, Regression, Clustering, Classification, SVM, Random Forests, Decision Trees, Dimensionality Reduction, TensorFlow, Convolutional & Recurrent Neural Networks, Autoencoders, Reinforcement and More

4,168 Ratings        13,500+ learners

  200+ Hours of Online Self-Paced Training

  270 Days of Lab

  24x7 Support

  20+ Projects

About E&ICT, IIT Roorkee

The Electronics & ICT Academy program is sponsored by the Ministry of Electronics and Information Technology, Govt. of India.

The E&ICT Academy IIT Roorkee conducts short courses/FDPs in the emerging areas to enrich & upgrade subject knowledge and technical skills benefiting faculty, working professionals and Govt. employees.

The trained beneficiaries are expected to create a cascading effect, transforming the competencies and standards in the parent institutes/organizations.

E & ICT Academy IIT Roorkee supported by Ministry of Electronics and Information Technology (MeitY) with CloudxLab as industry partner, is conducting a training program in Data Science.

The E&ICT courses lay special emphasis on hands-on learning with participation from industry experts. These programs also enable the participants and institutes to build industry connects, upgrade lab facilities and create opportunities for collaboration.

E&ICT courses are at par with QIP for recognition/credits.

As of now the E&ICT Academy, IIT Roorkee has conducted 91 courses and trained over 5,000 beneficiaries.

For more details, please visit the E&ICT Academy (IIT Roorkee) official website here:

About the Course

This Data Science Certification Program is a self-paced online course. This gives you complete freedom about your schedule and convenience.

This course has over 200 hours of video content. This consists of 5 courses (Big Data with Hadoop, Big Data with Spark, Python, Machine Learning, and Deep Learning).

Additionally, this course comes with our exclusive lab access to gain the much needed hands-on experience to solve the real-world problems.

Upon successfully completing the course, you will get the certificate from E&ICT, IIT Roorkee which you can use for progressing in your career and finding better opportunities.

5 Courses

Big Data with Hadoop, Big Data with Spark, Python, Machine Learning, Deep Learning

Cloud Lab

Apply the skills you learn on a distributed cluster to solve real-world problems


Work on about 20 projects to get hands-on experience

Best-in-class Support

24x7 Support. Discussion forum to answer all your queries throughout your learning journey


Highlight your new skills on your resume or LinkedIn. Certificate issued by E&ICT, IIT Roorkee.
Course + Lab + Certificate

270 Days Lab

199 716
Course + Lab + Certificate

330 Days Lab

219 796
Learning Path
Download Course Syllabus

Course 1

Python for Machine Learning

1. Introduction to Linux

2. Introduction to Python

3. Hands-on using Jupyter on CloudxLab

4. Overview of Linear Algebra

5. Introduction to NumPy & Pandas

6. Quizzes, gamified assessments & projects

Course 2

Big Data with Hadoop

1.1 Big Data Introduction
1.2 Distributed systems
1.3 Big Data Use Cases
1.4 Various Solutions
1.5 Overview of Hadoop Ecosystem
1.6 Spark Ecosystem Walkthrough
1.7 Quiz
2.1 Understanding the CloudxLab
2.2 Getting Started - Hands on
2.3 Hadoop & Spark Hands-on
2.4 Quiz and Assessment
2.5 Basics of Linux - Quick Hands-On
2.6 Understanding Regular Expressions
2.7 Quiz and Assessment
2.8 Setting up VM (optional)
3.1 ZooKeeper - Race Condition
3.2 ZooKeeper - Deadlock
3.3 Hands-On
3.4 Quiz & Assessment
3.5 How does election happen - Paxos Algorithm?
3.6 Use cases
3.7 When not to use
3.8 Quiz & Assessment
4.1 Why HDFS or Why not existing file systems?
4.2 HDFS - NameNode & DataNodes
4.3 Quiz
4.4 Advance HDFS Concepts (HA, Federation)
4.5 Quiz
4.6 Hands-on with HDFS (Upload, Download, SetRep)
4.7 Quiz & Assessment
4.8 Data Locality (Rack Awareness)
5.1 YARN - Why not existing tools?
5.2 YARN - Evolution from MapReduce 1.0
5.3 Resource Management: YARN Architecture
5.4 Advance Concepts - Speculative Execution
5.5 Quiz
6.1 MapReduce - Understanding Sorting
6.2 MapReduce - Overview
6.3 Quiz
6.4 Example 0 - Word Frequency Problem - Without MR
6.5 Example 1 - Only Mapper - Image Resizing
6.6 Example 2 - Word Frequency Problem
6.7 Example 3 - Temperature Problem
6.8 Example 4 - Multiple Reducer
6.9 Example 5 - Java MapReduce Walkthrough
6.10 Quiz
7.1 Writing MapReduce Code Using Java
7.2 Building MapReduce project using Apache Ant
7.3 Concept - Associative & Commutative
7.4 Quiz
7.5 Example 8 - Combiner
7.6 Example 9 - Hadoop Streaming
7.7 Example 10 - Adv. Problem Solving - Anagrams
7.8 Example 11 - Adv. Problem Solving - Same DNA
7.9 Example 12 - Adv. Problem Solving - Similar DNA
7.10 Example 12 - Joins - Voting
7.11 Limitations of MapReduce
7.12 Quiz
8.1 Pig - Introduction
8.2 Pig - Modes
8.3 Getting Started
8.4 Example - NYSE Stock Exchange
8.5 Concept - Lazy Evaluation
9.1 Hive - Introduction
9.2 Hive - Data Types
9.3 Getting Started
9.4 Loading Data in Hive (Tables)
9.5 Example: Movielens Data Processing
9.6 Advance Concepts: Views
9.7 Connecting Tableau and HiveServer 2
9.8 Connecting Microsoft Excel and HiveServer 2
9.9 Project: Sentiment Analyses of Twitter Data
9.10 Advanced - Partition Tables
9.11 Understanding HCatalog & Impala
9.12 Quiz
10.1 NoSQL - Scaling Out / Up
10.2 NoSQL - ACID Properties and RDBMS Story
10.3 CAP Theorem
10.4 HBase Architecture - Region Servers etc
10.5 Hbase Data Model - Column Family Orientedness
10.6 Getting Started - Create table, Adding Data
10.7 Adv Example - Google Links Storage
10.8 Concept - Bloom Filter
10.9 Comparison of NOSQL Databases
10.10 Quiz
11.1 Sqoop - Introduction
11.2 Sqoop Import - MySQL to HDFS
11.3 Exporting to MySQL from HDFS
11.4 Concept - Unbounding Dataset Processing or Stream Processing
11.5 Flume Overview: Agents - Source, Sink, Channel
11.6 Example 1 - Data from Local network service into HDFS
11.7 Example 2 - Extracting Twitter Data
11.8 Quiz
11.9 Example 3 - Creating workflow with Oozie

Course 3

Big Data with Spark

1.1 Apache Spark ecosystem walkthrough
1.2 Spark Introduction - Why Spark?
1.3 Quiz
2.1 Scala - Quick Introduction - Access Scala on CloudxLab
2.2 Scala - Quick Introduction - Variables and Methods
2.3 Getting Started: Interactive, Compilation, SBT
2.4 Types, Variables & Values
2.5 Functions
2.6 Collections
2.7 Classes
2.8 Parameters
2.9 More Features
2.10 Quiz and Assessment
3.1 Apache Spark ecosystem walkthrough
3.2 Spark Introduction - Why Spark?
3.3 Using the Spark Shell on CloudxLab
3.4 Example 1 - Performing Word Count
3.5 Understanding Spark Cluster Modes on YARN
3.6 RDDs (Resilient Distributed Datasets)
3.7 General RDD Operations: Transformations & Actions
3.8 RDD lineage
3.9 RDD Persistence Overview
3.10 Distributed Persistence
4.1 Creating the SparkContext
4.2 Building a Spark Application (Scala, Java, Python)
4.3 The Spark Application Web UI
4.4 Configuring Spark Properties
4.5 Running Spark on Cluster
4.6 RDD Partitions
4.7 Executing Parallel Operations
4.8 Stages and Tasks
5.1 Common Spark Use Cases
5.2 Example 1 - Data Cleaning (Movielens)
5.3 Example 2 - Understanding Spark Streaming
5.4 Understanding Kafka
5.5 Example 3 - Spark Streaming from Kafka
5.6 Iterative Algorithms in Spark
5.7 Project: Real-time analytics of orders in an e-commerce company
6.1 InputFormat and InputSplit
6.2 JSON
6.3 XML
6.4 AVRO
6.5 How to store many small files - SequenceFile?
6.6 Parquet
6.7 Protocol Buffers
6.8 Comparing Compressions
6.9 Understanding Row Oriented and Column Oriented Formats - RCFile?
7.1 Spark SQL - Introduction
7.2 Spark SQL - Dataframe Introduction
7.3 Transforming and Querying DataFrames
7.4 Saving DataFrames
7.5 DataFrames and RDDs
7.6 Comparing Spark SQL, Impala, and Hive-on-Spark
8.1 Machine Learning Introduction
8.2 Applications Of Machine Learning
8.3 MlLib Example: k-means
8.4 SparkR Example

Course 4

Machine Learning

1. Introduction to Statistics

Statistical Inference, Types of Variables, Probability Distribution, Normality, Measures of Central Tendencies, Normal Distribution

2. Machine Learning Applications & Landscape

Introduction to Machine Learning, Machine Learning Application, Introduction to AI, Different types of Machine Learning - Supervised, Unsupervised, Reinforcement

3. Building end-to-end Machine Learning Project

Machine Learning Projects Checklist, Frame the problem and look at the big picture, Get the data, Explore the data to gain insights, Prepare the data for Machine Learning algorithms, Explore many different models and short-list the best ones, Fine-tune model, Present the solution, Launch, monitor, and maintain the system

4. Classifications

Training a Binary classification, Performance Measures, Confusion Matrix, Precision and Recall, Precision/Recall Tradeoff, The ROC Curve, Multiclass Classification, Multilabel Classification, Multioutput Classification

5. Training Models

Linear Regression, Gradient Descent, Polynomial Regression, Learning Curves, Regularized Linear Models, Logistic Regression

6. Support Vector Machines

Linear SVM Classification, Nonlinear SVM Classification, SVM Regression

7. Decision Trees

Training and Visualizing a Decision Tree, Making Predictions, Estimating Class Probabilities, The CART Training Algorithm, Gini Impurity or Entropy, Regularization Hyperparameters, Regression, Instability

8. Ensemble Learning and Random Forests

Voting Classifiers, Bagging and Pasting, Random Patches and Random Subspaces, Random Forests, Boosting, Stacking

9. Dimensionality Reduction

The Curse of Dimensionality, Main Approaches for Dimensionality Reduction, PCA, Kernel PCA, LLE, Other Dimensionality Reduction Techniques

10. Quizzes, gamified assessments & projects

Course 5

Deep Learning

1. Introduction to Deep Learning

Deep Learning Applications, Artificial Neural Network, TensorFlow Demo, Deep Learning Frameworks

2. Up and Running with TensorFlow

Installation, Creating Your First Graph and Running It in a Session, Managing Graphs, Lifecycle of a Node Value, Linear Regression with TensorFlow, Implementing Gradient Descent, Feeding Data to the Training Algorithm, Saving and Restoring Models, Visualizing the Graph and Training Curves Using TensorBoard, Name Scopes, Modularity, Sharing Variables

3. Introduction to Artificial Neural Networks

From Biological to Artificial Neurons, Training an MLP with TensorFlow’s High-Level API, Training a DNN Using Plain TensorFlow, Fine-Tuning Neural Network Hyperparameters

4. Training Deep Neural Nets

Vanishing / Exploding Gradients Problems, Reusing Pretrained Layers, Faster Optimizers, Avoiding Overfitting Through Regularization, Practical Guidelines

5. Convolutional Neural Networks

The Architecture of the Visual Cortex, Convolutional Layer, Pooling Layer, CNN Architectures

6. Recurrent Neural Networks

Recurrent Neurons, Basic RNNs in TensorFlow, Training RNNs, Deep RNNs, LSTM Cell, GRU Cell, Natural Language Processing

7. Autoencoders

Efficient Data Representations, Performing PCA with an Undercomplete Linear Autoencoder, Stacked Autoencoders, Unsupervised Pretraining Using Stacked Autoencoders, Denoising Autoencoders, Sparse Autoencoders, Variational Autoencoders

8. Reinforcement Learning

Learning to Optimize Rewards, Policy Search, Introduction to OpenAI Gym, Neural Network Policies, Evaluating Actions: The Credit Assignment Problem, Policy Gradients, Markov Decision Processes, Temporal Difference Learning and Q-Learning, Learning to Play Ms. Pac-Man Using Deep Q-Learning

9. Quizzes, gamified assessments & projects



1. Analyze Emails

Churn the mail activity from various individuals in an open source project development team.

2. Predict the median housing prices in California

We start Machine Learning course with this end-to-end project. Learn various data manipulation, visualization and cleaning techniques using various libraries of Python like Pandas, Scikit-Learn and Matplotlib.

3. Classify handwritten digits in MNIST dataset

The MNIST dataset is considered as "Hello World!" of Machine Learning. Write your first classification logic. Starting with Binary Classification learn Multiclass, Multilabel, Multi-output classification and different error analysis techniques.

4. Noise removal from the images

Build a model that takes a noisy image as an input and outputs the clean image.

5. Predict the class of flower in IRIS dataset

IRIS dataset contains 3 classes of 50 instances each, where each class refers to a type of iris plant. The three classes in this dataset are Setosa, Versicolor, and Verginica. Learn Decision Trees, CART algorithm and Ensemble method. Then use Random Forest classifier to make predictions.

6. Predict which passengers survived in the Titanic shipwreck

The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. In this project, you build a model to predict which passengers survived the tragedy.

7. Predict bikes rental demand

Build a model to predict the bikes demand given the past data.

8. Build a spam classifier

Build a model to classify email as spam or ham. First, download examples of spam and ham from Apache SpamAssassin’s public datasets and then train a model to classify email.

9. Build cats classifier using neural network

In this project, you will build a basic neural network to classify if a given image is of cat or not.

10. Classify large images using Inception v3

Download images of various animals and then download the latest pretrained Inception v3 model. Run the model to classify downloaded images and display the top five predictions for each image, along with the estimated probability.

11. Classify clothes using TensorFlow

Build a model to classify clothes into various categories in Fashion MNIST dataset.

12. Predict the hourly rain gauge total

This is a time series prediction task: you are given snapshots of polarimetric radar values and asked to predict the hourly rain gauge total.

13. Sentiment analysis

Sentiment analysis of "Iron Man 3" movie using Hive and visualizing the sentiment data using BI tools such as Tableau

14. Process the NSE

Process the NSE (National Stock Exchange) data using Hive for various insights

15. MovieLens Project

Analyze MovieLens data using Hive

16. Spark MLlib

Generate movie recommendations using Spark MLlib

17. Spark GraphX

Derive the importance of various handles at Twitter using Spark GraphX

18. Churn the logs

Churn the logs of NASA Kennedy Space Center WWW server using Spark to find out useful business and devops metrics

19. Spark application

Write end-to-end Spark application starting from writing code on your local machine to deploying to the cluster

20. Analytics Dashboard

Real-time analytics dashboard for an e-commerce company using Apache Spark, Kafka, Spark Streaming, Node.js, Socket.IO and Highcharts



Earn your certificate

Our course is exhaustive and the certificate rewarded by us is proof that you have taken a big leap in Hadoop, Spark, Machine Learning and Deep Learning.

Differentiate yourself

The knowledge you have gained from working on projects, videos, quizzes, hands-on assessments and case studies gives you a competitive edge.

Share your achievement

Highlight your skills on your resume, LinkedIn, Facebook and Twitter. Tell your friends and colleagues about it.

Certification guideline

Complete at least 60% of the topics of the course along with projects - Analyze emails from Python, Sentiment Analysis from Hadoop, Log Parsing from Spark, any 3 projects from Machine Learning and any 2 projects from Deep Learning. All the above requirements need to be met within 330 days from the course enrollment date to be eligible for the certificate.

Course Creators
Sandeep Giri

Sandeep Giri

Founder at CloudxLab
Past: Amazon, InMobi, D.E.Shaw
Course Developer
Sandeep Giri

Sanjeev Manhas

Associate Professor,
IIT Roorkee
Course Advisor
Sandeep Giri

R. Balasubramanian

IIT Roorkee
Course Developer
Sandeep Giri

Partha Pratim Roy

Assistant Professor,
IIT Roorkee
Course Developer
Abhinav Singh

Abhinav Singh

Co-Founder at CloudxLab
Past: Byjus
Course Developer
 Jatin Shah

Jatin Shah

Ex-LinkedIn, Yahoo, Yale CS Ph.D.
Course Advisor


(4.9 out of 5)

This course is suitable for everyone. Me being a product manager had not done hands-on coding since quite some time. Python was completely new to me. However, Sandeep Giri gave us a crash course to Python and then introduced us to Machine Learning. Also, the CloudxLab’s environment was very useful to just log in and start practising coding and playing with things learnt. A good mix of theory and practical exercises and specifically the sequence of starting straight away with a project and then going deeper was a very good way of teaching. I would recommend this course to all.


Machine learning courses in especially the Artificial Intelligence for the manager course is excellent in CloudxLab. I have attended some of the course and able to understand as Sandeep Giri sir has taught AI course from scratch and related to our data to day life…

He even takes free sessions to helps students and provides career guidance.

His courses are worthy and even just by watching YouTube video anyone can easily crack the AI interview.


This is one of the best-designed course, very informative and well paced. The killer feature of machine/deep learning coursed from CloudxLab is the live session with access to labs for hands-on practices! With that, it becomes easy following any discourse, even if one misses the live sessions(Read that as me!). Sandeep(course instructor) has loads of patience and his way of explaining things are just remarkable. I might have better comments to add here, once I learn more! Great Jobs guys!


It has been a wonderful learning experience with CXL. This is one of the courses that will probably stay with me for a significant amount of time. The platform provides a unique opportunity to try hands-on simultaneously with the coursework in an almost real-life coding example. Besides, learning to use algebra, tech system and Git is a good refresher for anyone planning to start or stay in technology. The course covers the depth and breadth of ML topics. I specifically like the MNIST example and the depth to which it goes in explaining each and every line of code. Would definitely recommend the instructor-led course.


I took both the machine learning and deep learning course at CloudXLab. I came into the first part of the course with some knowledge of machine learning but the class really helped me understand some of the topics a lot clearer. I think the best part of the class is the instructor Sandeep. He is very knowledgeable and does a really good job explaining topics that can be nebulous at times. Another favorite part of the course are the online labs. I would watch the 3hr lecture the next day, and then work on the labs. The labs reinforces the lectures with questions and coding assignments. There is also a message board and a slack channel. I preferred using slack, but I think you get a quicker response if you use the message board. As far as the deep learning portion of the course, it was all new to me but I was building CNN and RNN models using TensorFlow after each 3hr lecture. Overall, I was very pleased with the course. I am hoping that CloudxLab will put together an advanced class focusing more on deploying models to the clouds, working with pipelines, DevOps etc…


I found the ML&DL course very well structured with ample examples and hands on exercises. Sandeep was very patient in answering questions and he made the training sessions very interactive. I would recommend this training to all who plan to take a dive into the world of machine and deep learning.


I have thoroughly enjoyed both the ML and DL courses from CloudXLab and will look forward to reviewing the videos/material at a later time. I’ve been to many meetups and paid sessions on ML /DL and this course beats most of them on the depth of topics and certainly breadth of topics. I’ve not taken any online courses (Andrew Ng, for example) to their conclusion, so I won’t draw a conclusion there. For an instructor-led, interactive course, I would expect to pay many times more for a class (ML and DL) such as this in the US. The instructor is easy to understand, has extensive experience, and truly cares about the student knowing the material.


A very well structured instructor-led course. The instructor was very thorough, and always willing to answer questions and clarify coursework, no matter how minor. The course described the theory of machine/deep learning well, but also followed through with very thorough examples to demonstrate the practical implementations of the theory. This leads nicely into the student exercises, which served to solidify the instructor's teachings and encourage experimentation. The resources provided for students was exceptional and presented in a very user-friendly format.

My only complaint is that the course went quite overtime, but I also appreciate Sandeeps dedication to quality and ensuring that he finished teaching us everything adequately.


I have been using CloudxLab for Machine Learning and based on experience I can say that they have done a fabulous job in training and certification process which makes the user so interactive with faculty and software intuitive.


You need to complete at least 60% of the topics from the course. You also need to complete projects - Analyse emails from Python, Sentiment Analysis (Hive) from Hadoop, Log Parsing from Spark, 3 mandatory projects from Machine Learning, and 2 mandatory projects from Deep Learning. All the above requirements need to be met within 330 days from the course enrollment date to be eligible for the certificate from E&ICT Academy, IIT Roorkee.

No, we will provide you with the access to our online lab and BootML so that you do not have to install anything on your local machine

Please note that you will not get a certificate from EICT academy, IIT Roorkee if your deadline to earn the EICT academy, IIT Roorkee certificate is over. You have two options in such case-

  1. Complete all the topics to 100% and earn the certificate by CloudxLab. Here is the sample certificate from CloudxLab
  2. Purchase the course again and finish the formalities to earn EICT academy, IIT Roorkee certificate within the new deadlines. Contact us for the exclusive discount code for purchasing the course again.

Please note though course access is valid for the lifetime, to earn the EICT academy, IIT Roorkee certificate, you are required to finish the formalities within the stipulated deadlines. You can renew the lab anytime here

It is a self-paced course. You will get access to videos, quizzes, hands-on assessments and projects. If you have any doubts during your learning journey, you can post it on the discussion forum. Our experts and community will assist you over there.

Please log in at with your Gmail Id and access your course under "My Courses".

This course is for engineers, product managers and anyone who wants to learn. We will cover foundations of linear algebra, calculus and statistical inference where ever required so that you can learn the concepts effectively. There is no prerequisite or programming knowledge required.

As this specialization course is comprised of 5 sub courses, we give a total of 5 certificates on completion of the whole specialization course. This includes a certificate for the whole specialization course as well.

For self-paced course, we provide 100% fees refund if the request is raised within 7 days from enrollment date. Please contact us at to request a refund within the stipulated time. Thereafter, no refund is provided.

Please mail your required projects at After submitting the projects, our course experts will review it and forward your details to EICT, IIT Roorkee. EICT will be issuing your certificate

(Under the current COVID-19 situation, E&ICT Academy is working with a limited workforce. Since the certificate is issued from E&ICT -IIT Roorkee, the concerned authority will take a bit more time than the usual time to check your work and update)

Contact Details

Electronics and ICT Academy

Electronics & ICT Academy
Electronics & Communication Department
Indian Institute of technology
Roorkee 247667, Uttarakhand INDIA


#215, The Arcade, Brigade Metropolis,
Mahadevpura, Bangalore
India - 560048
Phone: 080 - 4920 2224

Have more questions? Please contact us at