About the Course

As humans, we are immersed in data in our every-day lives. As per IBM, the data doubles every two years on this planet. The value that data holds can only be understood when we can start to identify patterns and trends in the data. Normal computing principles do not work when data becomes huge.

There is massive growth in the big data space, and job opportunities are skyrocketing, making this the perfect time to launch your career in this space.

In this course, you will learn Spark to drive better business decisions and solve real-world problems.

Learn From Industry Experts

Get Access to 30+ Hours of Training.

Projects & Lab

Apply the skills you learn on a distributed cluster to solve real-world problems.

Certificate

Highlight your new skills on your resume or LinkedIn.

Best-in-class Support

Timely doubt resolution through the discussion forum with the help of international community of peers.

Certifications

Compatible to Hortonworks Certified Developer (HDPCD): Spark

Subscribe Now

Refer your friends and get 30 days free lab access Invite Friends »

14 /month

Real-time cluster access
Earn Industry-relevant Certificates
No access to third-party courses and instructor-led trainings
Access to Job Portal

Subscribe Now

19 /month

Unlimited Access to all CloudxLab self-paced courses
Real-time cluster access
Earn Industry-relevant Certificates
No access to third-party courses and instructor-led trainings
Access to Job Portal

Subscribe Now

24 /month

Real-time cluster access
Earn Industry-relevant Certificates
No access to third-party courses and instructor-led trainings
Access to Job Portal

Subscribe Now

29 /month

Unlimited Access to all CloudxLab self-paced courses
Real-time cluster access
Earn Industry-relevant Certificates
No access to third-party courses and instructor-led trainings
Access to Job Portal

Subscribe Now

34 /month

Real-time cluster access
Earn Industry-relevant Certificates
No access to third-party courses and instructor-led trainings
Access to Job Portal

Subscribe Now

49 /month

Unlimited Access to all CloudxLab self-paced courses
Real-time cluster access
Earn Industry-relevant Certificates
No access to third-party courses and instructor-led trainings
Access to Job Portal

Subscribe Now

Get a callback from a Course Counselor - Click Here

Learning Path

Course

About the Course

This course is a part of the Specialization Course in Big Data with Hadoop and Spark

1. Introduction

1.1 Apache Spark ecosystem walkthrough

1.2 Spark Introduction - Why Spark?

1.3 Quiz

2. Scala Basics

2.1 Scala - Quick Introduction - Access Scala on CloudxLab

2.2 Scala - Quick Introduction - Variables and Methods

2.3 Getting Started: Interactive, Compilation, SBT

2.4 Types, Variables & Values

2.5 Functions

2.6 Collections

2.7 Classes

2.8 Parameters

2.9 More Features

2.10 Quiz and Assessment

3. Spark Basics

3.1 Apache Spark ecosystem walkthrough

3.2 Spark Introduction - Why Spark?

3.3 Using the Spark Shell on CloudxLab

3.4 Example 1 - Performing Word Count

3.5 Understanding Spark Cluster Modes on YARN

3.6 RDDs (Resilient Distributed Datasets)

3.7 General RDD Operations: Transformations & Actions

3.8 RDD lineage

3.9 RDD Persistence Overview

3.10 Distributed Persistence

4. Writing and Deploying Spark Applications

4.1 Creating the SparkContext

4.2 Building a Spark Application (Scala, Java, Python)

4.3 The Spark Application Web UI

4.4 Configuring Spark Properties

4.5 Running Spark on Cluster

4.6 RDD Partitions

4.7 Executing Parallel Operations

4.8 Stages and Tasks

5. Common Patterns in Spark Data Processing

5.1 Common Spark Use Cases

5.2 Example 1 - Data Cleaning (Movielens)

5.3 Example 2 - Understanding Spark Streaming

5.4 Understanding Kafka

5.5 Example 3 - Spark Streaming from Kafka

5.6 Iterative Algorithms in Spark

5.7 Project: Real-time analytics of orders in an e-commerce company

6. Data Formats & Management

6.1 InputFormat and InputSplit

6.2 JSON

6.3 XML

6.4 AVRO

6.5 How to store many small files - SequenceFile?

6.6 Parquet

6.7 Protocol Buffers

6.8 Comparing Compressions

6.9 Understanding Row Oriented and Column Oriented Formats - RCFile?

7. DataFrames and Spark SQL

7.1 Spark SQL - Introduction

7.2 Spark SQL - Dataframe Introduction

7.3 Transforming and Querying DataFrames

7.4 Saving DataFrames

7.5 DataFrames and RDDs

7.6 Comparing Spark SQL, Impala, and Hive-on-Spark

8. Machine Learning with Spark

8.1 Machine Learning Introduction

8.2 Applications Of Machine Learning

8.3 MlLib Example: k-means

8.4 SparkR Example

Projects

1. Generate movie recommendations using Spark MLlib

2. Churn the logs of NASA Kennedy Space Center WWW server using Spark to find out useful business and devops metrics

3. Write end-to-end Spark application starting from writing code on your local machine to deploying to the cluster

4. Build real-time analytics dashboard for an e-commerce company using Apache Spark, Kafka, Spark Streaming, Node.js, Socket.IO and Highcharts

Certificate

Earn your certificate

Our Specialization is exhaustive and the certificate rewarded by us is proof that you have taken a big leap in Big Data domain.

Differentiate yourself

The knowledge you have gained from working on projects, videos, quizzes, hands-on assessments and case studies gives you a competitive edge.

Share your achievement

Highlight your new skills on your resume, LinkedIn, Facebook and Twitter. Tell your friends and colleagues about it.

Course Creators

Sandeep Giri

Founder at CloudxLab
Past: Amazon, InMobi, D.E.Shaw

Course Developer

Abhinav Singh

Co-Founder at CloudxLab
Past: Byjus

Course Developer

Jatin Shah

Ex-LinkedIn, Yahoo, Yale CS Ph.D.
IIT-B

Course Advisor

Reviews

(4.9 out of 5)

Peter Sabry

I have started learning 3 months ago and I really gained much info and practical experience. I completed the “Big Data with Spark” course and the learning journey really exceeded my expectations.

The course structure and topics were great, well organized and comprehensive, even the basics of Linux were covered in a very simple way. There were always exercises and hands-on that build better understanding, also the lab environment and provided online tools were great help and let you practice everything without having to install anything on your PC except the web browser.

In addition, for the live sessions, it was really a joy attending them each weekend, our instructor “Sandeep Giri”, besides his great experience and knowledge, he was generous, helpful and patient answering all attendees questions in such a way that he could go for more examples and hands-on or even searching the documentation and try new things, I gained much from other attendees’ questions and the way Sandeep responded to them.

This was a great experience having this course and I’m going for more courses in Big Data and Machine Learning with CloudxLab and I recommend it for all my friends and colleagues who look for better learning.

Kamal Upadhyay

This course is suitable for everyone. Me being a product manager had not done hands-on coding since quite some time. Python was completely new to me. However, Sandeep Giri gave us a crash course to Python and then introduced us to Machine Learning. Also, the CloudxLab’s environment was very useful to just log in and start practising coding and playing with things learnt. A good mix of theory and practical exercises and specifically the sequence of starting straight away with a project and then going deeper was a very good way of teaching. I would recommend this course to all.

Daya Paari

Must have for practicing and perfecting hadoop. To setup in PC you need to have a very high end configuration and setup will be pseudo node setup.. For better understanding I recomend CloudxLab

Satyajit Das

Machine learning courses in especially the Artificial Intelligence for the manager course is excellent in CloudxLab. I have attended some of the course and able to understand as Sandeep Giri sir has taught AI course from scratch and related to our data to day life…

He even takes free sessions to helps students and provides career guidance.

His courses are worthy and even just by watching YouTube video anyone can easily crack the AI interview.

Manolo Ramírez

They are great. They take care of all the Big Data technologies (Hadoop, Spark, Hive, etc.) so you do not have to worry about installing and running them correclty on your pc. Plus, they have a fantastic customer support. Even when I have had problems debugging my own programs, they have answered me with the correct solution in a few hours, and all of this for a more than reasonable price. I personally recommend it to everyone :)

FAQ

What do you mean by self-paced learning?

In Self-paced learning, you will get,

Lifetime access to the self-paced course including videos, assessments, quizzes, and projects
Pre-Recorded videos of instructor-led session
24x7 support using the discussion forum

What are the prerequisites and requirements for this course?

This course is for engineers, product managers and anyone who wants to learn. We will cover foundations of linear algebra, calculus and statistical inference where ever required so that you can learn the concepts effectively. There is no prerequisite or programming knowledge required.

Do I need to install any software before starting this course?

No, we will provide you with the access to our online lab and BootML so that you do not have to install anything on your local machine

Who will be the course instructors?

The instructors for this course are industry experts having years of experience in mentoring students across the world.

What is the validity of course material?

We understand that you might need course material for a longer duration to make most out of your subscription. You will get lifetime access to the course material so that you can refer to the course material anytime.

How can I see the Course Preview?

You can check https://youtu.be/dXCx4anEcgU for watching the Course Preview.

What are the hardware and software requirements?

Course requires a good internet connection and a browser to watch videos and do hands-on the lab. We've configured all the tools in the lab so that you can focus on learning and practising in a real-world cluster.

What is the refund policy for courses taken from CloudxLab?

For self-paced course, we provide 100% fees refund if the request is raised within 7 days from enrollment date. Please contact us at reachus@cloudxlab.com to request a refund within the stipulated time. Thereafter, no refund is provided.

Can I renew my lab subscription?

Yes, you can renew your subscription anytime. Please choose your desired plan for the lab and make payment to renew your subscription.

Can I get a certificate for the projects completed?

We have created a set of Guided Projects on our platform. You may complete these guided projects and earn the certificate for free. Check it out here

Related Courses

View Details

View Details

Big Data with Spark Training Online Course (With Lab Access)

3,506 Ratings 9,025 learners

30+ hours training

90 days of Lab

Timely Doubt Resolution

Projects

Compatible with Hortonworks, Cloudera Certifications

Learn From Industry Experts

Projects & Lab

Certificate

Best-in-class Support

Certifications

Basic Subscription for 12 months

Premium Subscription for 12 months

Basic Subscription for 6 months

Premium Subscription for 6 months

Basic Subscription for 1 month

Premium Subscription for 1 month

Course

About the Course

Projects

Certificate

Earn your certificate

Differentiate yourself

Share your achievement

Sandeep Giri

Abhinav Singh

Jatin Shah

Reviews

Big Data with Hadoop

Big Data with Hadoop and Spark