Learn Spark, Spark RDD, Spark Streaming, Kafka, SparkR, SparkSQL, MLlib, and GraphX From Industry Experts
As humans, we are immersed in data in our every-day lives. As per IBM, the data doubles every two years on this planet. The value that data holds can only be understood when we can start to identify patterns and trends in the data. Normal computing principles do not work when data becomes huge.
There is massive growth in the big data space, and job opportunities are skyrocketing, making this the perfect time to launch your career in this space.
In this course, you will learn Spark to drive better business decisions and solve real-world problems.
What is Big Data?
Why Now?
Big Data Use Cases
Various Solutions
Spark Ecosystem Walkthrough
Quiz
Understanding the CloudxLab
CloudxLab Hands-On
Spark Hands-on
Quiz and Assessment
Basics of Linux - Quick Hands-On
Understanding Regular Expressions
Quiz and Assessment
Setting up VM (optional)
As part of this session we will do a recap of the sessions on Hadoop Distributed File System(HDFS) and Yet Another Resource Negotiator (YARN).
This is needed because most of the spark applications use data from HDFS and in most of deployments, spark applications are run on YARN clusters.
Introduction to Scala?
Accessing Scala using CloudxLab
Getting Started: Interactive, Compilation, SBT
Types, Variables & Values
Functions
Collections
Classes
Parameters
More Features
Quiz and Assessment
What is Apache Spark?
Why Spark?
Using the Spark Shell and various ways of running spark on CloudxLab
Example 1 - Performing Word Count
Understanding Spark Cluster Modes on YARN
RDDs (Resilient Distributed Datasets)
General RDD Operations: Transformations & Actions
RDD lineage
RDD Persistence Overview
Distributed Persistence
Learn operations on Key-Value Based RDD
Solving various problems using RDD
Creating the SparkContext
Building a Spark Application (Scala, Java, Python)
The Spark Application Web UI
Configuring Spark Properties
Running Spark on Cluster
RDD Partitions
Executing Parallel Operations
Stages and Tasks
Project: Churning the logs of NASA Kennedy Space Center WWW server
Using Accumulators & Creating Custom Accumulators
Using Broadcast variables
We will learn key performance considerations:
Understanding Caching & Persistence
We will Data Partitioning/Re-partitioning techniques.
A project to consider the above optimization techniques.
We will how to create custom partitioner.
Understand the Spark Runtime Architecture and various components such as Driver, Executor, Cluster Manager etc.
Learn what goes inside when we launch an spark application.
We will learn the two modes of Spark: Local and Cluster.
How to launch a program on YARN, AWS Cluster etc.
How to setup spark in standalone mode.
Understand and demonstrate on how to run drive in various modes.
Learn how to package the dependencies of your code.
Understand how to use the Spark-Submit and various command line options.
Common Spark Use Cases
Example 1 - Data Cleaning (Movielens)
Example 2 - Understanding Spark Streaming
Understanding Kafka
Example 3 - Spark Streaming from Kafka
Iterative Algorithms in Spark
Project: Real-time analytics of orders in an e-commerce company
Spark SQL and the SQL Context
Creating DataFrames
Transforming and Querying DataFrames
Saving DataFrames
Solving problems with DataFrames and RDDs
Comparing Spark SQL, Impala, and Hive-on-Spark
Understanding and loading various Input formats: JSON, XML, AVRO, SequenceFile?, Parquet, Protocol Buffers.
Comparing Compressions
Understanding Row Oriented and Column Oriented Formats - RCFile?
Understanding Machine Learning
MlLib Example: Recommendations on movie lense data
Understanding various Packages of MLlib
SparkR Example.
Basics of Graph Processing: Covers the understanding of what does it mean by graph processing in real life with examples. What are other frameworks providing graph computing?
GraphX Overview: What is GraphX? Understanding the functionalities and algorithms provided by GraphX. And how does GraphX work. Along with comparision with other similar products.
Implementing Page rank using GraphX: We will learn the basics of PageRank - the algorithm that made Google. The we learn how to implement using GraphX.
1. Generate movie recommendations using Spark MLlib
2. Derive the importance of various handles at Twitter using Spark GraphX
3. Churn the logs of NASA Kennedy Space Center WWW server using Spark to find out useful business and devops metrics
4. Write end-to-end Spark application starting from writing code on your local machine to deploying to the cluster
5. Build real-time analytics dashboard for an e-commerce company using Apache Spark, Kafka, Spark Streaming, Node.js, Socket.IO and Highcharts
Our Specialization is exhaustive and the certificate rewarded by us is proof that you have taken a big leap in Big Data domain.
The knowledge you have gained from working on projects, videos, quizzes, hands-on assessments and case studies gives you a competitive edge.
Highlight your new skills on your resume, LinkedIn, Facebook and Twitter. Tell your friends and colleagues about it.
You can check https://youtu.be/dXCx4anEcgU for watching the Course Preview.
Have more questions? Please contact us at reachus@cloudxlab.com
I have started learning 3 months ago and I really gained much info and practical experience. I completed the “Big Data with Spark” course and the learning journey really exceeded my expectations.
The course structure and topics were great, well organized and comprehensive, even the basics of Linux were covered in a very simple way. There were always exercises and hands-on that build better understanding, also the lab environment and provided online tools were great help and let you practice everything without having to install anything on your PC except the web browser.
In addition, for the live sessions, it was really a joy attending them each weekend, our instructor “Sandeep Giri”, besides his great experience and knowledge, he was generous, helpful and patient answering all attendees questions in such a way that he could go for more examples and hands-on or even searching the documentation and try new things, I gained much from other attendees’ questions and the way Sandeep responded to them.
This was a great experience having this course and I’m going for more courses in Big Data and Machine Learning with CloudxLab and I recommend it for all my friends and colleagues who look for better learning.
Must have for practicing and perfecting hadoop. To setup in PC you need to have a very high end configuration and setup will be pseudo node setup.. For better understanding I recomend CloudxLab
This course is suitable for everyone. Me being a product manager had not done hands-on coding since quite some time. Python was completely new to me. However, Sandeep Giri gave us a crash course to Python and then introduced us to Machine Learning. Also, the CloudxLab’s environment was very useful to just log in and start practising coding and playing with things learnt. A good mix of theory and practical exercises and specifically the sequence of starting straight away with a project and then going deeper was a very good way of teaching. I would recommend this course to all.
They are great. They take care of all the Big Data technologies (Hadoop, Spark, Hive, etc.) so you do not have to worry about installing and running them correclty on your pc. Plus, they have a fantastic customer support. Even when I have had problems debugging my own programs, they have answered me with the correct solution in a few hours, and all of this for a more than reasonable price. I personally recommend it to everyone :)
Machine learning courses in especially the Artificial Intelligence for the manager course is excellent in CloudxLab. I have attended some of the course and able to understand as Sandeep Giri sir has taught AI course from scratch and related to our data to day life…
He even takes free sessions to helps students and provides career guidance.
His courses are worthy and even just by watching YouTube video anyone can easily crack the AI interview.