Faculty Development Program for Spark Developer

The E&ICT Academy project is sponsored by Ministry of Electronics and Information Technology, Govt. of India. The E&ICT courses lay special emphasis on hands-on learning with participation from industry in the emerging areas of E&ICT domain. Our programs enable the participants and institutes to build industry connects, upgrade lab facilities and create opportunities for collaboration. As of now we have conducted 70 courses and trained over 3000 beneficiaries successfully. Learn More

E&ICT Academy IIT Roorkee supported by Ministry of Electronics and Information Technology ( MeitY) in collaboration with CloudxLab, is conducting a training program in Big Data with Spark.

As humans, we are immersed in data in our every-day lives. As per IBM, the data doubles every two years on this planet. The value that data holds can only be understood when we can start to identify patterns and trends in the data. Normal computing principles do not work when data becomes huge.

There is massive growth in the big data space, and job opportunities are skyrocketing, making this the perfect time to launch your career in this space.

In this course, you will learn Spark to drive better business decisions and solve real-world problems.

Eligibiliy criteria - Any professional, faculty of any Government or private institutions, employees of government.

1 course

Learn from industry experts. Follow the suggested order or choose your own.

Projects & Lab

Apply the skills you learn on a distributed cluster to solve real-world problems.

Certificate

Certification from E&ICT Academy IIT Roorkee

Best-in-class Support

24×7 support and forum access to answer all your queries throughout your learning journey.

Certifications

Compatible to Hortonworks Certified Developer (HDPCD): Spark

Enrollment

Faculty Development Program

31 May

Fri, Sat
(5 weeks)

6:30 p.m. - 9:30 p.m. America/Los_Angeles

90 days lab

15 ~~536~~

Sold Out

7 Jun

Fri, Sat
(5 weeks)

6:30 p.m. - 9:30 p.m. America/Los_Angeles

90 days lab

359 ~~536~~

Sold Out

Learning Path

Course

About the Course

1. Introduction

What is Big Data?

Why Now?

Big Data Use Cases

Various Solutions

Spark Ecosystem Walkthrough

Quiz

2. Foundation & Environment

Understanding the CloudxLab

CloudxLab Hands-On

Spark Hands-on

Quiz and Assessment

Basics of Linux - Quick Hands-On

Understanding Regular Expressions

Quiz and Assessment

Setting up VM (optional)

3. Recap of HDFS and YARN

As part of this session we will do a recap of the sessions on Hadoop Distributed File System(HDFS) and Yet Another Resource Negotiator (YARN).

This is needed because most of the spark applications use data from HDFS and in most of deployments, spark applications are run on YARN clusters.

4. Scala Basics

Introduction to Scala?

Accessing Scala using CloudxLab

Getting Started: Interactive, Compilation, SBT

Types, Variables & Values

Functions

Collections

Classes

Parameters

More Features

Quiz and Assessment

5. Spark Basics

What is Apache Spark?

Why Spark?

Using the Spark Shell and various ways of running spark on CloudxLab

Example 1 - Performing Word Count

Understanding Spark Cluster Modes on YARN

RDDs (Resilient Distributed Datasets)

General RDD Operations: Transformations & Actions

RDD lineage

RDD Persistence Overview

Distributed Persistence

Learn operations on Key-Value Based RDD

Solving various problems using RDD

6. Writing and Deploying Spark Applications

Creating the SparkContext

Building a Spark Application (Scala, Java, Python)

The Spark Application Web UI

Configuring Spark Properties

Running Spark on Cluster

RDD Partitions

Executing Parallel Operations

Stages and Tasks

Project: Churning the logs of NASA Kennedy Space Center WWW server

7. Spark Advanced Operations

Using Accumulators & Creating Custom Accumulators

Using Broadcast variables

We will learn key performance considerations:

Level of Parallelism
Serialization Format
Memory Management
Hardware Provisioning

Understanding Caching & Persistence

We will Data Partitioning/Re-partitioning techniques.

A project to consider the above optimization techniques.

We will how to create custom partitioner.

8. Running Spark on Cluster

Understand the Spark Runtime Architecture and various components such as Driver, Executor, Cluster Manager etc.

Learn what goes inside when we launch an spark application.

We will learn the two modes of Spark: Local and Cluster.

How to launch a program on YARN, AWS Cluster etc.

How to setup spark in standalone mode.

Understand and demonstrate on how to run drive in various modes.

Learn how to package the dependencies of your code.

Understand how to use the Spark-Submit and various command line options.

9. Stream processing with Spark and Kafka

Common Spark Use Cases

Example 1 - Data Cleaning (Movielens)

Example 2 - Understanding Spark Streaming

Understanding Kafka

Example 3 - Spark Streaming from Kafka

Iterative Algorithms in Spark

Project: Real-time analytics of orders in an e-commerce company

10. DataFrames and Spark SQL

Spark SQL and the SQL Context

Creating DataFrames

Transforming and Querying DataFrames

Saving DataFrames

Solving problems with DataFrames and RDDs

Comparing Spark SQL, Impala, and Hive-on-Spark

Understanding and loading various Input formats: JSON, XML, AVRO, SequenceFile?, Parquet, Protocol Buffers.

Comparing Compressions

Understanding Row Oriented and Column Oriented Formats - RCFile?

11. Machine Learning

Understanding Machine Learning

MlLib Example: Recommendations on movie lense data

Understanding various Packages of MLlib

SparkR Example.

12. Graph Processing with Spark GraphX

Basics of Graph Processing: Covers the understanding of what does it mean by graph processing in real life with examples. What are other frameworks providing graph computing?

GraphX Overview: What is GraphX? Understanding the functionalities and algorithms provided by GraphX. And how does GraphX work. Along with comparision with other similar products.

Implementing Page rank using GraphX: We will learn the basics of PageRank - the algorithm that made Google. The we learn how to implement using GraphX.

Projects

1. Generate movie recommendations using Spark MLlib

2. Derive the importance of various handles at Twitter using Spark GraphX

3. Churn the logs of NASA Kennedy Space Center WWW server using Spark to find out useful business and devops metrics

4. Write end-to-end Spark application starting from writing code on your local machine to deploying to the cluster

5. Build real-time analytics dashboard for an e-commerce company using Apache Spark, Kafka, Spark Streaming, Node.js, Socket.IO and Highcharts

Certificate

Earn your certificate from E&ICT Academy IIT Roorkee.

Get a joint, verifiable certification from IIT Roorkee and CloudxLab after the completion any of the FDP program.

Differentiate yourself

The knowledge you have gained from working on projects, videos, quizzes, hands-on assessments and case studies gives you a competitive edge.

Share your achievement

Highlight your new skills on your resume, LinkedIn, Facebook and Twitter. Tell your friends and colleagues about it.

Course Creators

Sandeep Giri

Founder at CloudxLab
Past: Amazon, InMobi, D.E.Shaw

Course Developer

Sanjeev Manhas

Associate Professor,
IIT Roorkee

Course Advisor

R. Balasubramanian

Professor,
IIT Roorkee

Course Developer

Partha Pratim Roy

Assistant Professor,
IIT Roorkee

Course Developer

Abhinav Singh

Co-Founder at CloudxLab
Past: Byjus

Course Developer

Jatin Shah

Ex-LinkedIn, Yahoo, Yale CS Ph.D.
IIT-B

Course Advisor

Reviews

(4.9 out of 5)

Peter Sabry

I have started learning 3 months ago and I really gained much info and practical experience. I completed the “Big Data with Spark” course and the learning journey really exceeded my expectations.

The course structure and topics were great, well organized and comprehensive, even the basics of Linux were covered in a very simple way. There were always exercises and hands-on that build better understanding, also the lab environment and provided online tools were great help and let you practice everything without having to install anything on your PC except the web browser.

In addition, for the live sessions, it was really a joy attending them each weekend, our instructor “Sandeep Giri”, besides his great experience and knowledge, he was generous, helpful and patient answering all attendees questions in such a way that he could go for more examples and hands-on or even searching the documentation and try new things, I gained much from other attendees’ questions and the way Sandeep responded to them.

This was a great experience having this course and I’m going for more courses in Big Data and Machine Learning with CloudxLab and I recommend it for all my friends and colleagues who look for better learning.

Kamal Upadhyay

This course is suitable for everyone. Me being a product manager had not done hands-on coding since quite some time. Python was completely new to me. However, Sandeep Giri gave us a crash course to Python and then introduced us to Machine Learning. Also, the CloudxLab’s environment was very useful to just log in and start practising coding and playing with things learnt. A good mix of theory and practical exercises and specifically the sequence of starting straight away with a project and then going deeper was a very good way of teaching. I would recommend this course to all.

Daya Paari

Must have for practicing and perfecting hadoop. To setup in PC you need to have a very high end configuration and setup will be pseudo node setup.. For better understanding I recomend CloudxLab

Satyajit Das

Machine learning courses in especially the Artificial Intelligence for the manager course is excellent in CloudxLab. I have attended some of the course and able to understand as Sandeep Giri sir has taught AI course from scratch and related to our data to day life…

He even takes free sessions to helps students and provides career guidance.

His courses are worthy and even just by watching YouTube video anyone can easily crack the AI interview.

Manolo Ramírez

They are great. They take care of all the Big Data technologies (Hadoop, Spark, Hive, etc.) so you do not have to worry about installing and running them correclty on your pc. Plus, they have a fantastic customer support. Even when I have had problems debugging my own programs, they have answered me with the correct solution in a few hours, and all of this for a more than reasonable price. I personally recommend it to everyone :)

FAQ

How is the course on Faculty Development Program - Spark Developer different from Big Data with Spark course offered by CloudxLab?

FDP - Program is supported by Ministry of Electronics and Information Technology ( MeitY) in collaboration with CloudxLab, learners will be entitled to get a joint, verifiable certification from IIT Roorkee and CloudxLab after the completion of the FDP program whereas completion of Big Data with Spark Course on CloudxLab will provide you the certificate only from CloudxLab.

Do you provide recordings of these live sessions in case a learner is unable to attend the online session?

Yes the recordings of the sessions will be given to you as a part of your course under "My Courses " page.

Do we need to attend all the live sessions?

Yes, it is mandatory to attend all the live sessions unless there are some uncertain circumstances happening at your end which will be considered by us only when you drop us a mail mentioning your absence for the session. As per the guidelines, you have to secure at least 60% of attendance in order to get a certificate from IIT Roorkee.

How will your support help us in case we fail to attend some sessions, and have any queries after going through the recorded session?

You can drop us a mail at discuss.cloudxlab.com mentioning your question, and our technical team will get back to you as soon as possible.

Are the freelancers eligible for this course?

You may apply for the enrollment by submitting the required documents. Documents provided will be verified by concerned authorities, after which we will inform you of the status of your application. In case your documents are not approved by the authorities, you won't be eligible for the certificate recognized by IIT Roorkee.

How can I see the Course Preview?

You can check https://youtu.be/dXCx4anEcgU for watching the Course Preview.

Contact Details

Electronics and ICT Academy

Electronics & ICT Academy Electronics & Communication Department Indian Institute of technology Roorkee 247667, Uttarakhand INDIA

CloudxLab

#215, The Arcade, Brigade Metropolis, Mahadevpura, Bangalore India - 560048 Phone: 080 - 4920 2224

Electronics & ICT Academy

Indian Institute Of Technology Roorkee

An initiative of Ministry of Electronics and Information Technology (MeitY) Govt. of India

Faculty Development Program in Big Data with Spark

178 Ratings 566 learners

30+ hours training

90 days of Lab

24x7 Support

Projects

Compatible with Hortonworks, Cloudera Certifications

1 course

Projects & Lab

Certificate

Best-in-class Support

Certifications

Course

About the Course

Projects

Certificate

Earn your certificate from E&ICT Academy IIT Roorkee.

Differentiate yourself

Share your achievement

Sandeep Giri

Sanjeev Manhas

R. Balasubramanian

Partha Pratim Roy

Abhinav Singh

Jatin Shah

Reviews

Electronics and ICT Academy

CloudxLab