90 Days

Duration

Self-Paced Online

Format

IIT Roorkee

Certificate

About the Course

Computing systems have fueled the growth of AI. Improvements in deep-learning algorithms have inevitably gone hand-in-hand with the improvements in the hardware-accelerators. Our ability to train increasingly-complex AI models and achieve low-power, real-time inference depends on the capabilities of computing systems.

In recent years, the metrics used for optimizing and evaluating AI algorithms are diversifying: along with accuracy, there is increasing emphasis on the metrics such as energy efficiency and model size. Given this, researchers working on deep-learning can no longer afford to ignore the computing-system. Rather, the knowledge of potential and limitations of computing-system can provide invaluable guidance to them in designing the most efficient and accurate algorithms.

This course aims to inform students, practitioners and researchers in deep-learning algorithms about the potential and limitations of various processor architectures for accelerating the deep learning algorithms. At the same time, it seeks to motivate and even challenge the engineers and professionals in the architecture domain to optimize the processors according to the needs of deep-learning algorithms.

This course discusses acceleration of AI algorithms on various computing systems such as FPGAs, mobile GPUs, smartphones, ASICs (e.g., such as Google's TPU) and CPUs. We primarily focus on CNNs and will also include recurrent neural networks. Apart from performance and energy metrics, this course will also discuss hardware reliability and security issues/techniques for deep-learning algorithms/accelerators. We will also draw from recent research papers to showcase the state-of-art in these fields.

This course is at the intersection of deep learning algorithms and computer architecture, and chip-design, and thus, is expected to be beneficial for a broad range of audience.

Upon successfully completing the course, you will get the certificate from IIT Roorkee which you can use for progressing in your career and finding better opportunities.

Program Highlights

  • Certificate of Completion by IIT Roorkee

  • Self-Paced Online

  • Cloud Lab Access

  • Timely Doubt Resolution

  • Best In Class Curriculum

Certificate

What is the certificate like?

  • Why IIT Roorkee?

    IIT Roorkee is ranked first among all the IITs and 20th position globally in citations per faculty. Established in 1847, it's one of the oldest technical institutions in Asia.
    IIT Roorkee fosters a very strong entrepreneurial culture. Some of their alumni are highly successful as entrepreneurs in the new age digital economy.

  • Why Cloudxlab?

    CloudxLab (CxL) has been a pioneer in the edtech space for the past few years. Founded in 2015 by Sandeep Giri, an alumnus of IIT Roorkee, CxL has successfully transformed 1,000's of students' careers by offering world-class certification courses in big data, machine learning and artificial intelligence.

    Some of the unique features of CxL are an exclusive gamified learning environment through the lab (read as CloudxLab), highest rated faculty, excellent student support and more.

Hands-on Learning

hands-on lab
  • Gamified Learning Platform


  • Auto-assessment Tests


  • No Installation Required

Instructor

Instructor Sparsh Mittal

Prof. Sparsh Mittal

Faculty ECE Dept
IIT Roorkee

Dr. Sparsh Mittal is currently working as an assistant professor at IIT Roorkee, India. He received the B.Tech. degree from IIT, Roorkee, India and the Ph.D. degree from Iowa State University (ISU), USA. He has worked as a Post-Doctoral Research Associate at Oak Ridge National Lab (ORNL), USA and as an assistant professor at CSE, IIT Hyderabad. He was the graduating topper of his batch in B.Tech and his BTech project received the best project award. He has received a fellowship from ISU and a performance award from ORNL.

He has published more than 100 papers at top venues and his research has been covered by technical websites such as InsideHPC, HPCWire, Phys.org, and ScientificComputing. He is an associate editor of Elsevier's Journal of Systems Architecture. He has given invited talks at ISC Conference at Germany, New York University, University of Michigan and Xilinx (Hyderabad). In Stanford's list of world's top researchers, in the field of Computer Hardware & Architecture, he was ranked as number 107 (for whole career) and as number 3 (for year 2019 alone).

Mentors

Instructor Sandeep Giri

Sandeep Giri

Founder at CloudxLab

Past: Amazon, InMobi, D.E.Shaw

Instructor Abhinav Singh

Abhinav Singh

Co-Founder at CloudxLab

Past: Byjus

Instructor Praveen

Praveen Pavithran

Co-Founder at Yatis

Past: YourCabs, Cypress Semiconductor

Curriculum

36+
Hours of Video
90
Days of Lab Access

Main topics

Commonly used optimization strategies in deep learning
Examples: tiling, loop optimizations, batching, quantization, pruning
Model-size aware and processor architecture-aware pruning of DNNs.
Accuracy, performance and model-size achieved by model-size aware and architecture-aware pruning.
Convolutional strategies: Direct, FFT-based, Winograd-based and Matrix-multiplication based.
Understanding their compute and memory characteristics and pros and cons; the deep learning frameworks that use these strategies
Deep learning on FPGAs and case study of Microsoft's Brainwave
Optimizing deep learning applications on FPGAs, clustering, etc; efficacy of FPGAs for binarized neural networks (BNNs)
Architecture of Microsoft’s Brainwave and the optimizations used such as pinning of parameters in the on-chip memory
Deep learning on an ASIC (especially Google's Tensor Processing Unit)
Architecture of Google TPUv1/v2
Architecture of Google TPUv1/v2
Qualitative comparison between Google’s TPU and Microsoft’s Brainwave
Deep learning on Embedded System (especially NVIDIA's Jetson Platform)
Comparison of architectural parameters of Jetson (TK1, TX1, TX2) with Intel UP, Raspberry Pi, DSP and FPGA
Study of some real-life applications mapped to Jetson platform, e.g., driver drowsiness detection, pill image recognition, local processing of CNN on a drone, drone racing, classifying weeds from drone imagery, detecting foot ulcers using a CNN, identifying faces of suspected people, etc.
Deep learning on Edge Devices (smartphones)
Challenges and opportunities faced in running Facebook app (which uses deep learning models) on smartphones of varied configurations (architecture/compute/memory-capacity)
Deep-learning on CPUs
Pros and cons of using CPUs in deep learning
Opportunities for CPUs in deep learning, such as low and medium-parallelism workloads, mobile platforms etc
Case study: Hardware/system-challenges in autonomous driving.
Comparison of CPU/GPU/FPGA/ASIC in running CNN workloads used for autonomous driving
Accelerators for recurrent neural networks (RNNs)
Unique architectural characteristics of RNNs compared to CNNs
Acceleration of RNNs on FPGAs, ASICs, etc
Optimization techniques such as pipelining, parallelization, batching, pruning, low-precision, etc. Exploiting tradeoff between compute and memory
Understanding reliability of deep-learning accelerators and algorithms
Reliability impact of errors in early and late layers of a CNN
Resilience of convolution and fully-connected layers N
Techniques for designing resilient deep-learning accelerators
Understanding hardware security of deep-learning accelerators and algorithms
Side-channel attacks
Fault-injection attacks N
Defense mechanisms
Distributed training of DNNs
Need for distributed training
Challenges in and Techniques for distributed training N
Case study: Training AlexNet in minutes using massively parallel supercomputers/GPU-clusters

Apply Now

Application Process

  1. Step 1. Submit the application form and SOP(Statement of Purpose)
    Register by filling the application form
  2. Step 2. Reviewing the application
    The admission team will review the application and respond with the application status in 48 hours
  3. Step 3. Join The Program
    Confirmation of seat is subject to the payment

Prerequisites

The candidate should have an idea of what is deep learning, especially the basics of CNNs and RNNs. Background in computer architecture or embedded-system is preferred, although not mandatory.

799

  • 36+ Hours of Video Content
  • Self-Paced Online Format
  • 90 Days of Online Lab Access
  • 24*7 Support
  • Certificate from IIT Roorkee
Apply Now

Testimonials

Frequently Asked Questions

What are the prerequisites for this course?

The candidate should have an idea of what is deep learning, especially the basics of CNNs and RNNs. Background in computer architecture or embedded-system is preferred, although not mandatory.

What are the expected career options after pursuing this course?

Someone who has successfully completed this course is expected to be able to solve problems more efficiently using some of the latest technologies in the industry. Learners who have completed this course will be a perfect fit for VLSI, Semiconductor, or similar industries.

What is your refund policy?

If you are unhappy with the product for any reason, let us know within 7 days of purchasing or upgrading your account, and we'll cancel your account and issue a full refund. Please contact us at reachus@cloudxlab.com to request a refund within the stipulated time. We will be sorry to see you go though!

Do I need to install any software before starting this course?

No, we will provide you with the access to our online lab and BootML so that you do not have to install anything on your local machine

What is the validity of course material?

We understand that you might need course material for a longer duration to make most out of your subscription. You will get lifetime access to the course material so that you can refer to the course material anytime.