Learn Spark, Spark RDD, Spark Streaming, Kafka, SparkR, SparkSQL, MLlib, and GraphX From Industry Experts
As humans, we are immersed in data in our every-day lives. As per IBM, the data doubles every two years on this planet. The value that data holds can only be understood when we can start to identify patterns and trends in the data. Normal computing principles do not work when data becomes huge.
There is massive growth in the big data space, and job opportunities are skyrocketing, making this the perfect time to launch your career in this space.
In this course, you will learn Spark to drive better business decisions and solve real-world problems.
What is Big Data?
Big Data Use Cases
Spark Ecosystem Walkthrough
Understanding the CloudxLab
Quiz and Assessment
Basics of Linux - Quick Hands-On
Understanding Regular Expressions
Quiz and Assessment
Setting up VM (optional)
InputFormat and InputSplit
How to store many small files - SequenceFile?
Understanding Row Oriented and Column Oriented Formats - RCFile?
Introduction to Scala?
Accessing Scala using CloudxLab
Getting Started: Interactive, Compilation, SBT
Types, Variables & Values
Quiz and Assessment
What is Apache Spark?
Using the Spark Shell on CloudxLab
Example 1 - Performing Word Count
Understanding Spark Cluster Modes on YARN
RDDs (Resilient Distributed Datasets)
General RDD Operations: Transformations & Actions
RDD Persistence Overview
Creating the SparkContext
Building a Spark Application (Scala, Java, Python)
The Spark Application Web UI
Configuring Spark Properties
Running Spark on Cluster
Executing Parallel Operations
Stages and Tasks
Project: Churning the logs of NASA Kennedy Space Center WWW server
Common Spark Use Cases
Example 1 - Data Cleaning (Movielens)
Example 2 - Understanding Spark Streaming
Example 3 - Spark Streaming from Kafka
Iterative Algorithms in Spark
Project: Real-time analytics of orders in an e-commerce company
Spark SQL and the SQL Context
Transforming and Querying DataFrames
DataFrames and RDDs
Comparing Spark SQL, Impala, and Hive-on-Spark
GraphX: Graph Processing and Analysis
Understanding Machine Learning
MlLib Example: k-means
1. Generate movie recommendations using Spark MLlib
2. Derive the importance of various handles at Twitter using Spark GraphX
3. Churn the logs of NASA Kennedy Space Center WWW server using Spark to find out useful business and devops metrics
4. Write end-to-end Spark application starting from writing code on your local machine to deploying to the cluster
5. Build real-time analytics dashboard for an e-commerce company using Apache Spark, Kafka, Spark Streaming, Node.js, Socket.IO and Highcharts
Our Specialization is exhaustive and the certificate rewarded by us is proof that you have taken a big leap in Big Data domain.
The knowledge you have gained from working on projects, videos, quizzes, hands-on assessments and case studies gives you a competitive edge.
Highlight your new skills on your resume, LinkedIn, Facebook and Twitter. Tell your friends and colleagues about it.
In Self-paced learning, you will get,
In Instructor-led training, you will get
Basics Of SQL. You should know the basics of SQL and databases. If you know about filters in SQL, you are expected to understand the course.
A know-how of the basics of programming. If you understand 'loops' in any programming language, and if you are able to create a directory and see what's inside a file from the command line, you are good to get the concepts of this course even if you have not really touched programming for the last 10 years! In addition, we will be providing video classes on the basics of Python and Scala.
The instructors for this course are industry experts having years of experience in mentoring students across the world.
It will take 2-3 months with 6-8 hours of effort per week.
We understand that you might need course material for a longer duration to make most out of your subscription. You will get lifetime access (Till the company is operational) to the course material so that you can refer to the course material anytime.
In online instructor-led training, Sandeep Giri along with his team of experts will train you with a group of our course learners for 25+ hours over online conferencing software like Zoom. Classes will happen every Saturday and Sunday
We offer mentoring sessions to our learners with industry leaders and professionals so you can get 1 on 1 help with any questions you may have, whether your questions are technical, job-related or anything else.
The first session is completely free and further, it is a paid service available to learners enrolling in the course.
At the end, of course, you will work on a real-time project. You will receive a problem statement along with a data-set to work on CloudxLab. Once you are done with the project (it will be reviewed by an expert), you will be awarded a certificate which you can share on LinkedIn.
Enrollment into self-paced course entails 90 days of free access to CloudxLab. Enrollment into instructor-led course entails 90 days of free access to Cloudxlab, depending on date of enrollment.
Yes. Java is generally required for understanding MapReduce. MapReduce is a programming paradigm for writing your logic in the form of Mapper and reducer functions. We provide a self-paced course on Java for free. As soon as you signup, it would be available in your account section.
Course requires a good internet (1 Mbps or more) and a browser to watch videos and do hands-on the lab. We've configured all the tools in the lab so that you can focus on learning and practicing in a real-world cluster.
At CloudxLab, we have always believed in quality education must be affordable for everyone so that we can help learners achieving career goals and build innovative products.
Please follow this post for more details on the financial aid.
Have more questions? Please contact us at firstname.lastname@example.org