Apache Spark and Scala Certification Training

Learn Spark, Spark RDD, Spark Streaming, Kafka, SparkR, SparkSQL, MLlib, and GraphX From Industry Experts

(9,025 Learners)

  25+ hours training

  Projects & Lab

  24x7 Support

  Compatible with Hortonworks and Cloudera Certifications

About the Course

As humans, we are immersed in data in our every-day lives. As per IBM, the data doubles every two years on this planet. The value that data holds can only be understood when we can start to identify patterns and trends in the data. Normal computing principles do not work when data becomes huge.

There is massive growth in the big data space, and job opportunities are skyrocketing, making this the perfect time to launch your career in this space.

In this course, you will learn Spark to drive better business decisions and solve real-world problems.

1 course

Learn from industry experts. Follow the suggested order or choose your own.

Projects & Lab

Apply the skills you learn on a distributed cluster to solve real-world problems.


Highlight your new skills on your resume or LinkedIn.

1:1 Mentoring

Subscribe to 1:1 mentoring sessions and get guidance from industry leaders and professionals.

Best-in-class Support

24×7 support and forum access to answer all your queries throughout your learning journey.


Compatible to Hortonworks Certified Developer (HDPCD): Spark
Learning Path


About the Course

Hardware and Software requirements
Course requires a good internet (1 Mbps or more) and a browser to watch videos and do hands-on the lab. We've configured all the tools in the lab so that you can focus on learning and practicing in a real-world cluster.

What is Big Data?

Why Now?

Big Data Use Cases

Various Solutions

Spark Ecosystem Walkthrough


Understanding the CloudxLab

CloudxLab Hands-On

Spark Hands-on

Quiz and Assessment

Basics of Linux - Quick Hands-On

Understanding Regular Expressions

Quiz and Assessment

Setting up VM (optional)

InputFormat and InputSplit




How to store many small files - SequenceFile?


Protocol Buffers

Comparing Compressions

Understanding Row Oriented and Column Oriented Formats - RCFile?

Introduction to Scala?

Accessing Scala using CloudxLab

Getting Started: Interactive, Compilation, SBT

Types, Variables & Values





More Features

Quiz and Assessment

What is Apache Spark?

Why Spark?

Using the Spark Shell on CloudxLab

Example 1 - Performing Word Count

Understanding Spark Cluster Modes on YARN

RDDs (Resilient Distributed Datasets)

General RDD Operations: Transformations & Actions

RDD lineage

RDD Persistence Overview

Distributed Persistence

Creating the SparkContext

Building a Spark Application (Scala, Java, Python)

The Spark Application Web UI

Configuring Spark Properties

Running Spark on Cluster

RDD Partitions

Executing Parallel Operations

Stages and Tasks

Project: Churning the logs of NASA Kennedy Space Center WWW server

Common Spark Use Cases

Example 1 - Data Cleaning (Movielens)

Example 2 - Understanding Spark Streaming

Understanding Kafka

Example 3 - Spark Streaming from Kafka

Iterative Algorithms in Spark

Project: Real-time analytics of orders in an e-commerce company

Spark SQL and the SQL Context

Creating DataFrames

Transforming and Querying DataFrames

Saving DataFrames

DataFrames and RDDs

Comparing Spark SQL, Impala, and Hive-on-Spark

GraphX: Graph Processing and Analysis

Understanding Machine Learning

MlLib Example: k-means

SparkR Example


Earn your certificate

Our course is exhaustive and the certificate rewarded by us is proof that you have taken a big leap in Big Data domain.

Differentiate yourself

The knowledge you have gained from working on projects, videos, quizzes, hands-on assessments and case studies gives you a competitive edge.

Share your achievement

Highlight your new skills on your resume, LinkedIn, Facebook and Twitter. Tell your friends and colleagues about it.

 Course Certificate Sample
Self-paced Learning

Learn at your pace

149 299

High-quality videos, slides, hands-on examples, quizzes, automated assessments, case studies, real-world projects

Lifetime access to cutting-edge self-paced learning content

90 days of lab access for hands-on practice

24x7 support to answer your queries

Earn certificate in Big Data with Apache Spark

Enroll Now
Online Instructor-led Training

Starts on 3 December | 7:30am - 10:30am PST | Sat, Sun | 25+ hours

299 399

High-quality videos, slides, hands-on examples, quizzes, automated assessments, case studies, real-world projects

Lifetime access to cutting-edge self-paced learning content

90 days of lab access for hands-on practice

24x7 support to answer your queries

Earn certificate in Big Data with Apache Spark

25+ hours of live online instructor-led training

Enroll Now
Sandeep Giri

Sandeep Giri

Founder at CloudxLab

Past - Amazon, InMobi, tBits Global, D.E.Shaw

For last 15 years, Sandeep has been building products and churning large amounts of data for various product companies. He has an all-around experience of software development and big data analysis.

Apart from digging data and technologies, Sandeep enjoys conducting interviews and explaining difficult concepts in simple ways.

Course Creators
Abhinav Singh

Abhinav Singh

Co-Founder at CloudxLab, Past- Byjus
Course Developer
Abhishek Agarwal

Abhishek Agarwal

Sales & Marketing Analyst - CloudxLab
Program Manager
 Jatin Shah

Jatin Shah

LinkedIn, Yahoo, Yale CS Ph.D.
Course Advisor



1. Basics Of SQL. You should know the basics of SQL and databases. If you know about filters in SQL, you are expected to understand the course.

2. A know-how of the basics of programming. If you understand 'loops' in any programming language, and if you are able to create a directory and see what's inside a file from the command line, you are good to get the concepts of this course even if you have not really touched programming for the last 10 years! In addition, we will be providing video classes on the basics of Python and Scala.

It will take 2-3 months with 6-8 hours of effort per week.
We understand that you might need course material for a longer duration to make most out of your subscription. You will get lifetime access (Till the company is operational) to the course material so that you can refer to the course material anytime.
In online instructor-led training, Sandeep Giri along with his team of experts will train you with a group of our course learners for 25+ hours over online conferencing software like Zoom. Classes will happen every Saturday and Sunday (between 7:30am - 10:30am PST ), starting from December 3, 2017.
We offer mentoring sessions to our learners with industry leaders and professionals so you can get 1 on 1 help with any questions you may have, whether your questions are technical, job-related or anything else.
It is a paid service and exclusively available to learners enrolling for the course. We will provide more information on subscription information for the same after the course is launched.
At the end, of course, you will work on a real-time project. You will receive a problem statement along with a data-set to work on CloudxLab. Once you are done with the project (it will be reviewed by an expert), you will be awarded a certificate which you can share on LinkedIn.
Enrollment into self-paced course entails 90 days of free access to CloudxLab. Enrollment into instructor-led course entails 90-150 days of free access to Cloudxlab, depending on date of enrollment.
Yes. Java is generally required for understanding MapReduce. MapReduce is a programming paradigm for writing your logic in the form of Mapper and reducer functions. We provide a self-paced course on Java for free. As soon as you signup, it would be available in your account section.
For self-paced course, we provide 100% fees refund if the request is raised within 7 days from enrollment date. Thereafter, no refund is provided.

For instructor-led course, we provide 100% refund if not more than 1 live session has been conducted -- and we provide 50% refund if 2-4 live sessions have been conducted. If 5 or more live sessions have been conducted, then no refund will be provided.

Have more questions? Please contact us at reachus@cloudxlab.com