Certification program from E & ICT Academy IIT Roorkee in Big Data with Spark. Learn Spark, Spark RDD, Spark Streaming, Kafka, SparkR, SparkSQL, MLlib, and GraphX From Industry Experts.
The E&ICT Academy project is sponsored by Ministry of Electronics and Information Technology, Govt. of India. The E&ICT courses lay special emphasis on hands-on learning with participation from industry in the emerging areas of E&ICT domain. Our programs enable the participants and institutes to build industry connects, upgrade lab facilities and create opportunities for collaboration. As of now we have conducted 70 courses and trained over 3000 beneficiaries successfully. Learn More
E&ICT Academy IIT Roorkee supported by Ministry of Electronics and Information Technology ( MeitY) in collaboration with CloudxLab, is conducting a training program in Big Data with Spark.
As humans, we are immersed in data in our every-day lives. As per IBM, the data doubles every two years on this planet. The value that data holds can only be understood when we can start to identify patterns and trends in the data. Normal computing principles do not work when data becomes huge.
There is massive growth in the big data space, and job opportunities are skyrocketing, making this the perfect time to launch your career in this space.
In this course, you will learn Spark to drive better business decisions and solve real-world problems.
Eligibiliy criteria - Any professional, faculty of any Government or private institutions, employees of government.
What is Big Data?
Big Data Use Cases
Spark Ecosystem Walkthrough
Understanding the CloudxLab
Quiz and Assessment
Basics of Linux - Quick Hands-On
Understanding Regular Expressions
Quiz and Assessment
Setting up VM (optional)
As part of this session we will do a recap of the sessions on Hadoop Distributed File System(HDFS) and Yet Another Resource Negotiator (YARN).
This is needed because most of the spark applications use data from HDFS and in most of deployments, spark applications are run on YARN clusters.
Introduction to Scala?
Accessing Scala using CloudxLab
Getting Started: Interactive, Compilation, SBT
Types, Variables & Values
Quiz and Assessment
What is Apache Spark?
Using the Spark Shell and various ways of running spark on CloudxLab
Example 1 - Performing Word Count
Understanding Spark Cluster Modes on YARN
RDDs (Resilient Distributed Datasets)
General RDD Operations: Transformations & Actions
RDD Persistence Overview
Learn operations on Key-Value Based RDD
Solving various problems using RDD
Creating the SparkContext
Building a Spark Application (Scala, Java, Python)
The Spark Application Web UI
Configuring Spark Properties
Running Spark on Cluster
Executing Parallel Operations
Stages and Tasks
Project: Churning the logs of NASA Kennedy Space Center WWW server
Using Accumulators & Creating Custom Accumulators
Using Broadcast variables
We will learn key performance considerations:
Understanding Caching & Persistence
We will Data Partitioning/Re-partitioning techniques.
A project to consider the above optimization techniques.
We will how to create custom partitioner.
Understand the Spark Runtime Architecture and various components such as Driver, Executor, Cluster Manager etc.
Learn what goes inside when we launch an spark application.
We will learn the two modes of Spark: Local and Cluster.
How to launch a program on YARN, AWS Cluster etc.
How to setup spark in standalone mode.
Understand and demonstrate on how to run drive in various modes.
Learn how to package the dependencies of your code.
Understand how to use the Spark-Submit and various command line options.
Common Spark Use Cases
Example 1 - Data Cleaning (Movielens)
Example 2 - Understanding Spark Streaming
Example 3 - Spark Streaming from Kafka
Iterative Algorithms in Spark
Project: Real-time analytics of orders in an e-commerce company
Spark SQL and the SQL Context
Transforming and Querying DataFrames
Solving problems with DataFrames and RDDs
Comparing Spark SQL, Impala, and Hive-on-Spark
Understanding and loading various Input formats: JSON, XML, AVRO, SequenceFile?, Parquet, Protocol Buffers.
Understanding Row Oriented and Column Oriented Formats - RCFile?
Understanding Machine Learning
MlLib Example: Recommendations on movie lense data
Understanding various Packages of MLlib
Basics of Graph Processing: Covers the understanding of what does it mean by graph processing in real life with examples. What are other frameworks providing graph computing?
GraphX Overview: What is GraphX? Understanding the functionalities and algorithms provided by GraphX. And how does GraphX work. Along with comparision with other similar products.
Implementing Page rank using GraphX: We will learn the basics of PageRank - the algorithm that made Google. The we learn how to implement using GraphX.
1. Generate movie recommendations using Spark MLlib
2. Derive the importance of various handles at Twitter using Spark GraphX
3. Churn the logs of NASA Kennedy Space Center WWW server using Spark to find out useful business and devops metrics
4. Write end-to-end Spark application starting from writing code on your local machine to deploying to the cluster
5. Build real-time analytics dashboard for an e-commerce company using Apache Spark, Kafka, Spark Streaming, Node.js, Socket.IO and Highcharts
Get a joint, verifiable certification from IIT Roorkee and CloudxLab after the completion any of the FDP program.
The knowledge you have gained from working on projects, videos, quizzes, hands-on assessments and case studies gives you a competitive edge.
Highlight your new skills on your resume, LinkedIn, Facebook and Twitter. Tell your friends and colleagues about it.
FDP - Program is supported by Ministry of Electronics and Information Technology ( MeitY) in collaboration with CloudxLab, learners will be entitled to get a joint, verifiable certification from IIT Roorkee and CloudxLab after the completion of the FDP program whereas completion of Big Data with Spark Course on CloudxLab will provide you the certificate only from CloudxLab.
Yes the recordings of the sessions will be given to you as a part of your course under "My Courses " page.
Yes, it is mandatory to attend all the live sessions unless there are some uncertain circumstances happening at your end which will be considered by us only when you drop us a mail mentioning your absence for the session. As per the guidelines, you have to secure at least 60% of attendance in order to get a certificate from IIT Roorkee.
You can drop us a mail at discuss.cloudxlab.com mentioning your question, and our technical team will get back to you as soon as possible.
You may apply for the enrollment by submitting the required documents. Documents provided will be verified by concerned authorities, after which we will inform you of the status of your application. In case your documents are not approved by the authorities, you won't be eligible for the certificate recognized by IIT Roorkee.
Have more questions? Please contact us at email@example.com