90 Days


Self-Paced Online


IIT Roorkee


About the Course

Computing systems have fueled the growth of AI. Improvements in deep-learning algorithms have inevitably gone hand-in-hand with the improvements in the hardware accelerators. Our ability to train increasingly complex AI models and achieve low-power, real-time inference depends on the capabilities of computing systems.

In recent years, the metrics used for optimizing and evaluating AI algorithms are diversifying: along with accuracy, there is increasing emphasis on the metrics such as energy efficiency and model size. Given this, researchers working on deep learning can no longer afford to ignore the computing system. Instead, the knowledge of the potential and limitations of computing systems can provide invaluable guidance to them in designing the most efficient and accurate algorithms.

This course aims to inform students, practitioners and researchers in deep-learning algorithms about the potential and limitations of various processor architectures for accelerating the deep learning algorithms. At the same time, it seeks to motivate and even challenge the engineers and professionals in the architecture domain to optimize the processors according to the needs of deep-learning algorithms.

Course contents: This course discusses AI acceleration on various computing systems, such as FPGAs, mobile/desktop GPUs, smartphones, ASICs, DSPs and CPUs. It explains the architecture of several commercial AI accelerators, viz., Microsoft's Brainwave, Qualcomm's Hexagon DSP, NVIDIA's desktop GPUs and Tensor cores, NVIDIA's Jetson GPU, Intel's Xeon Phi, Intel Habana Labs' Goya and Gaudi, Google's Tensor Processing Unit (TPU) version 1 to 4, Cerebras' Wafer Scale Engine, Alibaba's HanGuang Processor, Groq’s Tensor Streaming Processor (TSP), Untether’s TsunAImi Processor and Graphcore's Intelligence Processing Unit (IPU). Further, it discusses how Facebook optimizes AI services in its data center and how it optimizes its mobile app. The course also discusses several research-grade accelerators, such as memristor-based accelerators. Overall, the course teaches about AI accelerators at levels ranging from smartphone, desktop, server to data-center.

Apart from performance and energy metrics, this course also discusses hardware reliability and security techniques for deep-learning algorithms and accelerators. A few real-life applications that benefit from AI-accelerators are reviewed, such as autonomous driving and brain implants. The course draws from recent research papers to showcase the state-of-art in these fields. To make the course self-sufficient, a reasonable amount of background is presented on both computer architecture and CNNs.

This course is at the intersection of deep learning algorithms, computer architecture, and chip design, and thus, is expected to be beneficial for a broad range of learners.

Program Highlights

PG Certificate from IIT Roorkee

Certificate from IIT Roorkee

Certificate of Completion by IIT Roorkee

1 Week Immersion Program

Learn from Experts

Learn from IIT Roorkee professors and Industry Experts

Placement Eligibility Test

Placement Eligibility Test

Proctored Exams with Deep Learning models with opportunity to get Placed

Hands-On Project

Guided Projects

Get an hands-on experience with our Guided Projects

Timely Doubt Resolution

Timely Doubt Resolution

Get access to community of learners via our discussion forum

Access to Cloud Lab

Access to Cloud Lab

Lab comes pre-installed with all the software you will need to learn and practice.


What is the certificate like?

  • Why IIT Roorkee?

    IIT Roorkee is ranked first among all the IITs AND 20th position globally in citations per faculty. Established in 1847, it's one of the oldest technical institutions in Asia. IIT Roorkee fosters a very strong entrepreneurial culture. Some of their alumni are highly successful as entrepreneurs in the new age digital economy.

  • Why CloudxLab?

    CloudxLab is a team of developers, engineers, and educators passionate about building innovative products to make learning fun, engaging, and for life. We are a highly motivated team who build fresh and lasting learning experiences for our users. Powered by our innovation processes, we provide a gamified environment where learning is fun and constructive. From creative design to intuitive apps we create a seamless learning experience for our users. We upskill engineers in deep tech - make them employable & future-ready.



Among the IITs in the ‘Citations per Faculty’ parameter

*QS World Rankings

India Today


Ranked Engineering College

*India Today 2020



Ranked for IITs

*NIRF 2020



Ranked Best Global Universities in India

*QS World Rankings

Hands-on Learning

hands-on lab

  • Gamified Learning Platform
    Making learning fun and sustainable

  • Auto-assessment Tests
    Learn by writing code and executing it on lab

  • No Installation Required
    Lab comes pre-installed softwares and accessible everywhere


Instructor Sparsh Mittal

Prof. Sparsh Mittal

Faculty at ECE Dept and Center for AI and DS
IIT Roorkee

Dr. Sparsh Mittal is currently working as an assistant professor at ECE Dept at IIT Roorkee, India. He is also a joint faculty at Center for AI and DS at IIT Roorkee. He received the B.Tech. degree from IIT, Roorkee, India and the Ph.D. degree from Iowa State University (ISU), USA. He has worked as a Post-Doctoral Research Associate at Oak Ridge National Lab (ORNL), USA and as an assistant professor at CSE, IIT Hyderabad. He was the graduating topper of his batch in B.Tech and his BTech project received the best project award. He has received a fellowship from ISU and a performance award from ORNL.

He has published more than 100 papers at top venues and his research has been covered by technical websites such as InsideHPC, HPCWire, Phys.org, and ScientificComputing. He is an associate editor of Elsevier's Journal of Systems Architecture. He has given invited talks at ISC Conference at Germany, New York University, University of Michigan and Xilinx (Hyderabad). In Stanford's list of world's top researchers, in the field of Computer Hardware & Architecture, he was ranked as number 107 (for whole career) and as number 3 (for year 2019 alone).


Instructor Sandeep Giri

Sandeep Giri

Founder at CloudxLab

Past: Amazon, InMobi, D.E.Shaw

Instructor Abhinav Singh

Abhinav Singh

Co-Founder at CloudxLab

Past: Byjus

Instructor Praveen

Praveen Pavithran

Co-Founder at Yatis

Past: YourCabs, Cypress Semiconductor


Foundation Courses

1. Programming Tools and Foundational Concepts
1. Getting Started with Linux
2. Getting Started with Git
3. Python Foundations
4. Machine Learning Prerequisites(Including Numpy, Pandas and Linear Algebra)
5. Getting Started with SQL
6. Statistics Foundations

Foundations for Accelerators for Deep Learning

Approximate Computing
1.Motivation for and Background on Approximate Computing
2.Approximate Computing Techniques
Multiprocessing, multithreading and vectorization
1.Multiprocessing, Multithreading and Vectorization
2.Very long instruction word (VLIW) Architecture
GPU architecture and optimizations
1.GPU Architecture
2.Optimizations to GPU global memory and shared memory
3.GPU Tensor Core
Floating-point, fixed-point and integer formats
1.Floating-point, fixed-point and integer number-representation systems
Roofline model and arithmetic intensity
1.Roofline model and arithmetic intensity
Energy management approaches
1.DVS and DVFS

Optimizations techniques for Deep Learning

Common optimizations
1.Quick Intro to CNNs
2.Architectural characteristics of CNN layers
3.Memory and Compute Optimizations to CNNs such as tiling, loop optimizations, batching, quantization, pruning
Cache Blocking
1.Introduction to Cache Blocking
2.Blocking of Matrix Transpose
3.Blocking of Matrix Multiplication and Convolution
Four convolution strategies: Direct, GEMM, FFT and Winograd
1.Four types of convolution strategies. Understanding their compute and memory characteristics and pros and cons; the deep learning frameworks that use these strategies
2.Comparison of architectural characteristics of Caffe/cuDNN/fbfft/cuda-convnet2 frameworks
Model-size aware and system-aware pruning of CNNs
1.Model-size aware pruning of CNNs, Example of DeepCompression Technique
2.Hardware-platform aware Pruning of CNNs. Example of Scalpel Technique
3.Accuracy, performance and model-size achieved by model-size aware and architecture-aware pruning
MLPerf Benchmark for Evaluating DNN Accelerators
1.Intro To MLPerf Inference Benchmark

Accelerator architectures for DNNs

Deep Learning on Systolic Array and Tensor Processing Unit (TPU) v1 to v4
1.Introduction to Systolic Array for Matrix-Multiplication
2.Distinct Characteristics Of Training And Inference
3.Architectures of TPU v1, v2, v3 and v4; comparison between their architectures
4.Comparison of CPU, TPU and GPU
Deep Learning on FPGA and Microsoft’s Brainwave Architecture
1.Deep Learning techniques on FPGA; efficacy of FPGAs for binarized neural networks (BNNs) Microsoft’s Brainwave Architecture for Deep Learning and the optimizations used such as pinning of parameters in the on-chip memory
Deep Learning on CPU
1.Deep Learning on CPU: Scope and Motivation. Pros and cons of using CPUs in deep learning Opportunities for CPUs in deep learning, such as low and medium-parallelism workloads, mobile platforms etc
2.Deep Learning on CPU: Case Studies
3.Deep Learning in Heterogeneous Computing: Case studies (ig.LITTLE-style asymmetric multicore processors (AMPs); Wearable-handheld collaborative computing; Mobile+cloud collaborative computing; CPU+GPU heterogeneous computing on mobile devices )
Deep Learning on GPU
1.Deep Learning on GPU: Case studies
Deep Learning on Intel Habana’s Goya and Gaudi Processors
1.Intel Habana’s Goya Processor for Inference
2.Intel Habana’s Gaudi Processor for Training
Deep Learning on Intel’s Xeon Phi
1.Introduction To Intel’s Xeon Phi
2.Deep Learning on Xeon Phi: Case studies
Deep Learning on Mobile GPU
1.Introduction to NVIDIA Jetson and Comparison of architectural parameters of Jetson (TK1, TX1, TX2) with Intel UP, Raspberry Pi, DSP and FPGA
2.Deep Learning on NVIDIA Jetson: Study of some real-life applications mapped to Jetson platform, e.g., driver drowsiness detection, pill image recognition, local processing of CNN on a drone, drone racing, classifying weeds from drone imagery, detecting foot ulcers using a CNN, identifying faces of suspected people, etc.
Deep Learning on Smartphone
1.Background on Pipelining, Superscalar and out-of-order execution
2.Background on big.LITTLE-style asymmetric multicore processors (AMPs)
3.Optimizing Facebook’s Machine Learning-based App on Smartphones: Challenges and opportunities faced in running Facebook app on smartphones of varied configurations (architecture/compute/memory-capacity)
5.Accelerator-Level Parallelism
Deep Learning on Qualcomm’s Hexagon DSP
1.Qualcomm’s Hexagon DSP
Deep Learning on Cerebras Wafer Scale Engine
1.Cerebras’ Wafer Scale Engine (WSE) for Deep Learning
Deep Learning on Graphcore Intelligence Processing Unit (IPU)
1.Graphcore’s Intelligence Processing Unit (IPU) for Deep Learning
2.Comparison of TPU, GPU, IPU and WSE
Deep Learning on Memristors
1.Introduction to Memristors and processing-in-memory using Memristors
2.Memristor-based Deep Learning Accelerators (Accelerator Designs; Techniques for Reducing Analog Overheads; Pruning techniques; Reliability techniques)
Comparative evaluation of CPU/GPU/FPGA/ASIC for accelerating autonomous driving
1.Comparative evaluation of CPU/GPU/FPGA/ASIC for accelerating autonomous driving

Optimizing DNN Training

DNN Training on a single GPU
1.Overcoming Memory Limitations Of GPU During Training By Virtualizing GPU Memory
2.Background on Stacked DRAM Caches
3.Optimizing DNN Training on GPUs enabled with Stacked DRAM memory: Case Study
Distributed Training of DNN on multiple GPU nodes
1.Background on Distributed Training of DNN
2.Case study on Distributed Training of DNN


Security of DNN Algorithms and Accelerators
1.Motivation for and Background on Hardware Security of DNNs
2.Side-channel attacks on DNN algorithms/accelerators
3.Fault-injection attacks on DNN algorithms/accelerators


Reliability of DNN Algorithms and Accelerators
1.DNN Reliability: motivation, metrics and characteristics
2.Key Idea of DNN Reliability Techniques
3.DNN Reliability Techniques
4.Reliability impact of errors in early and late layers of a CNN; Resilience of convolution and fully-connected layers

Neural Branch Predictor

Neural Branch Predictor and Its Application in Brain-Implant
1.Quick intro to Branch Predictors
2.Perceptron (neural) Branch Predictor
3.Perceptron Branch Predictor In Brain Implant: Case study
Hours of Video
Days of Lab Access

Placement Assistance

Placement Eligibility Test

Placement Eligibility Test

We have around 300+ recruitment partners who will be interviewing you based on your performances in PET

Profile Building Sessions

Profile Building Sessions

Sessions will be conducted to guide you on creating the perfect resume and professional profile to get noticed by recruiters

Career Guidance Webinars

Career Guidance Webinars

Career Guidance Webinars from seasoned industry experts

Apply Now

Application Process

  • Step 1. Submit the application form and SOP(Statement of Purpose)
    Register by filling the application form

  • Step 2. Reviewing the application
    he admission team will review the application and respond with the application status in 48 hours

  • Step 3. Join The Program
    Confirmation of seat is subject to the payment

Certification Guideline

You will be required to complete 100% of the course content within 90 days of enrollment to be eligible for the certificate.


The candidate should have an idea of what is deep learning, especially the basics of CNNs and RNNs. Background in computer architecture or embedded-system is preferred, although not mandatory.


    1. 15% Scholarships are available for students, unemployed, women from STEM background, IIT Alumni and CloudxLab Alumni
    1. 10% Scholarship available for those clearing the scholarship test

PS: Details to avail the scholarship will be sent post application-submission and only one scholarship applicable per learner

No Cost EMI at


Or Program Fee 459

  • 3 Months Program
  • 90 Days of Online Lab Access
  • 24*7 Support
  • Certificate from IIT Roorkee
Apply Now»



Frequently Asked Questions

What are the prerequisites for this course?

The candidate should have an idea of what is deep learning, especially the basics of CNNs and RNNs. Background in computer architecture or embedded-system is preferred, although not mandatory.

What are the expected career options after pursuing this course?

Someone who has successfully completed this course is expected to be able to solve problems more efficiently using some of the latest technologies in the industry. Learners who have completed this course will be a perfect fit for VLSI, Semiconductor, or similar industries.

What is your refund policy?

If you are unhappy with the product for any reason, let us know within 7 days of purchasing or upgrading your account, and we'll cancel your account and issue a full refund. Please contact us at reachus@cloudxlab.com to request a refund within the stipulated time. We will be sorry to see you go though!

Do I need to install any software before starting this course?

No, we will provide you with the access to our online lab and BootML so that you do not have to install anything on your local machine

What is the validity of course material?

We understand that you might need course material for a longer duration to make most out of your subscription. You will get lifetime access to the course material so that you can refer to the course material anytime.