Course on Big Data with Hadoop & Spark

Learn From Industry Experts With 1:1 Mentoring & Online Instructor-led Training


245 Ratings | 948 Learners
Enroll Now >>

Why learn Big Data?

As humans, we are immersed in data in our every-day lives. As per IBM, the data doubles every two years on this planet. The value that data holds can only be understood when we can start to identify patterns and trends in the data. Normal computing principles do not work when data becomes huge.

There is massive growth in the big data space, and job opportunities are skyrocketing, making this the perfect time to launch your career in this space.

Big Data Market Salary

Enrollment

Learning with CloudxLab means top-notch training by industry experts and best-in-class learning content.
100% money-back guarantee (please check FAQ at bottom of this page for details on refund policy)

Self-paced Learning

Learn at your pace

Course Design

High-quality videos, slides, hands-on examples, quizzes, automated assessments, case studies, real-world projects

Course Material

Lifetime access to cutting-edge self-paced learning content

Lab

90 Days of CloudxLab access for hands-on practice

Support

24x7 email support to answer your queries

Certificate

Earn certificate in Big Data with Hadoop & Apache Spark




299   190

Online Instructor-led Training  

Starts on 22 October | 50 hours | Sat, Sun | 7:30 AM - 10:30 AM PST

Course Design

High-quality videos, slides, hands-on examples, quizzes, automated assessments, case studies, real-world projects

Course Material

Lifetime access to cutting-edge self-paced learning content

Lab

90-150 Days of CloudxLab access for hands-on practice (from enrollment date till 22 Janurary, 2018)

Support

24x7 email support to answer your queries

Certificate

Earn certificate in Big Data with Hadoop & Apache Spark

+ Live sessions

50+ hours of live online instructor-led training. Classes will be conducted every Saturday & Sunday between 7:30 AM - 10:30 AM PST (8 PM - 11 PM IST)
699   549

Online Instructor-led Training  

Starts on 17 September | 50 hours | Sat, Sun | 7:30 AM - 10:30 AM PST

Course Design

High-quality videos, slides, hands-on examples, quizzes, automated assessments, case studies, real-world projects

Course Material

Lifetime access to cutting-edge self-paced learning content

Lab

90-150 Days of CloudxLab access for hands-on practice

Support

24x7 email support to answer your queries

Certificate

Earn certificate in Big Data with Hadoop & Apache Spark

+ Live sessions

50+ hours of live online instructor-led training. Classes will be conducted every Saturday & Sunday between 7:30 AM - 10:30 AM PST (8 PM - 11 PM IST)
699

How will you benefit?

Skill Enhancement

Develop skills and competencies to excel and stand out in the Big Data domain

Career Growth

Get better roles and better packages

What do I get?

  • Lifetime access to course material

    Lifetime access to high-quality, self-paced learning content designed by industry experts Learn more

  • 90-150 days of CloudxLab access

    Practice on a real-time distributed environment
    Self-paced course: 90 days free access
    Instructor-led course: 90-150 days free access

  • Online Instructor-led Training

    Subscribe to Instructor-led training with a group of our course learners. Learn more

  • Best-in-class support

    24x7 email support to answer your queries. Get the answer to your queries in one business day

  • Training by professionals

    Learn from professionals having years of experience in churning Big Data and building enterprise products

  • Verified certificate

    Receive verified certificate and share it on LinkedIn

  • 1:1 Mentoring

    Subscribe to 1:1 mentoring sessions and get guidance from industry leaders and professionals. Learn more

  • LinkedIn recommendation & endorsements

    We will provide a LinkedIn Recommendation based on your performance. Also, we will endorse you with tags such as Hadoop, Big Data

Testimonials

Prerequisites and Requirements

  • Basics Of SQL. You should know the basics of SQL and databases. If you know about filters in SQL, you are expected to understand the course.

  • A know-how of the basics of programming. If you understand 'loops' in any programming language, and if you are able to create a directory and see what's inside a file from the command line, you are good to get the concepts of this course even if you have not really touched programming for the last 10 years! In addition, we will be providing video classes on the basics of Python and Scala.

Course Syllabus

Preview Course

Student Rating

View all reviews

Timeline  

2-3 Months

Skill Level  

Intermediate

Introduction

  • What is Big Data?
  • Why Now?
  • Big Data Use Cases
  • Various Solutions
  • Overview of Hadoop Ecosystem
  • Spark Ecosystem Walkthrough
  • Quiz

Foundation & Environment

  • Understanding the CloudxLab
  • CloudxLab Hands-On
  • Hadoop & Spark Hands-on
  • Quiz and Assessment
  • Basics of Linux - Quick Hands-On
  • Understanding Regular Expressions
  • Quiz and Assessment
  • Setting up VM (optional)

Zookeeper

  • Why Do we need it?
  • Understanding Data Model
  • Hands-On
  • Quiz & Assessment
  • How does election happen - Paxos Algorithm?
  • Use cases
  • When not to use
  • Quiz & Assessment

HDFS

  • Why HDFS or Why not existing file systems?
  • Understanding the architecture
  • Quiz
  • Advance HDFS Concepts (HA, Federation)
  • Quiz
  • Hands-on with HDFS (Upload, Download, SetRep)
  • Quiz & Assessment
  • Data Locality (Rack Awareness)

Data Formats & Management

  • InputFormat and InputSplit
  • JSON
  • XML
  • AVRO
  • How to store many small files - SequenceFile?
  • Parquet
  • Protocol Buffers
  • Comparing Compressions
  • Understanding Row Oriented and Column Oriented Formats - RCFile?

YARN

  • Computing - Why not existing tools?
  • MapReduce 1.0
  • Resource Management: YARN Architecture
  • Advance Concepts - Speculative Execution
  • Quiz

MapReduce Basics

  • Why MapReduce?
  • Understanding MapReduce Framework
  • Quiz
  • Example 0 - Word Frequency Problem - Without MR
  • Example 1 - Only Mapper - Image Resizing
  • Example 2 - Word Frequency Problem
  • Example 3 - Temperature Problem
  • Example 4 - Multiple Reducer
  • Example 5 - Java MapReduce Walkthrough
  • Quiz

MapReduce Advanced

  • Example 6 - Secondary Sorting (Word Recommendation)
  • Example 7 - Partitioner
  • Concept - Associative & Commutative
  • Quiz
  • Example 8 - Combiner
  • Example 9 - Hadoop Streaming
  • Example 10 - Adv. Problem Solving - Anagrams
  • Example 11 - Adv. Problem Solving - Same DNA
  • Example 12 - Adv. Problem Solving - Similar DNA
  • Example 12 - Joins - Voting
  • Limitations of MapReduce
  • Quiz

Analyzing Data with Pig

  • Why Pig?
  • Basic Structure of Pig Latin
  • Getting Started
  • Example - NYSE Stock Exchange
  • Concept - Lazy Evaluation

Processing Data with Hive

  • Why Hive?
  • Hive Architecture Overview
  • Getting Started
  • Loading Data in Hive (Tables)
  • Example: Movielens Data Processing
  • Advance Concepts: Views
  • Connecting Tableau and HiveServer 2
  • Connecting Microsoft Excel and HiveServer 2
  • Project: Sentiment Analyses of Twitter Data
  • Advanced - Partition Tables
  • Understanding HCatalog & Impala
  • Quiz

NoSQL and HBase

  • Case Study: The days before NoSQL
  • What is NoSQL?
  • CAP Theorem
  • HBase Architecture - Region Servers etc
  • Hbase Data Model - Column Family Orientedness
  • Getting Started - Create table, Adding Data
  • Adv Example - Google Links Storage
  • Concept - Bloom Filter
  • Comparison of NOSQL Databases
  • Quiz

Importing Data with Sqoop and Flume, Oozie

  • Sqoop Overview
  • Import From MySQL to HDFS, Hive, HBase
  • Exporting to MySQL from HDFS
  • Concept - Unbounding Dataset Processing or Stream Processing
  • Flume Overview: Agents - Source, Sink, Channel
  • Example 1 - Data from Local network service into HDFS
  • Example 2 - Extracting Twitter Data
  • Quiz
  • Example 3 - Creating workflow with Oozie

Scala Basics

  • Introduction to Scala?
  • Accessing Scala using CloudxLab
  • Getting Started: Interactive, Compilation, SBT
  • Types, Variables & Values
  • Functions
  • Collections
  • Classes
  • Parameters
  • More Features
  • Quiz and Assessment

Spark Basics

  • What is Apache Spark?
  • Why Spark?
  • Using the Spark Shell on CloudxLab
  • Example 1 - Performing Word Count
  • Understanding Spark Cluster Modes on YARN
  • RDDs (Resilient Distributed Datasets)
  • General RDD Operations: Transformations & Actions
  • RDD Lineage
  • RDD Persistence Overview
  • Distributed Persistence

Writing and Deploying Spark Applications

  • Creating the SparkContext
  • Building a Spark Application (Scala, Java, Python)
  • The Spark Application Web UI
  • Configuring Spark Properties
  • Running Spark on Cluster
  • RDD Partitions
  • Executing Parallel Operations
  • Stages and Tasks
  • Project: Churning the logs of NASA Kennedy Space Center WWW server

Common Patterns in Spark Data Processing

  • Common Spark Use Cases
  • Example 1 - Data Cleaning (Movielens)
  • Example 2 - Understanding Spark Streaming
  • Understanding Kafka
  • Example 3 - Spark Streaming from Kafka
  • Iterative Algorithms in Spark
  • Project: Real-time analytics of orders in an e-commerce company

DataFrames and Spark SQL

  • Spark SQL and the SQL Context
  • Creating DataFrames
  • Transforming and Querying DataFrames
  • Saving DataFrames
  • DataFrames and RDDs
  • Comparing Spark SQL, Impala, and Hive-on-Spark

Machine Learning with Spark

  • GraphX: Graph Processing and Analysis
  • Understanding Machine Learning
  • MlLib Example: k-means
  • SparkR Example

FAQ

Common questions and answers

  • How much time will it take to complete the course?

    It will take 2-3 months with 6-8 hours of effort per week.

  • What is the validity of course material?

    We understand that you might need course material for a longer duration to make most out of your subscription. You will get lifetime access (Till the company is operational) to the course material so that you can refer to the course material anytime

  • How does online instructor-led training work?

    In online instructor-led training, Sandeep Giri along with his team of experts will train you with a group of our course learners for 50 hours over online conferencing software like GoToMeeting. Classes will happen every Saturday and Sunday between 7:30 AM - 10:30 AM PST (8 PM - 11 PM IST), starting from October 22, 2017.

  • How does 1:1 mentoring work?

    We offer mentoring sessions to our learners with industry leaders and professionals so you can get 1 on 1 help with any questions you may have, whether your questions are technical, job-related or anything else.

    It is a paid service and exclusively available to learners enrolling for the course. We will provide more information on subscription information for the same after the course is launched.

  • What is the certification process?

    At the end, of course, you will work on a real-time project. You will receive a problem statement along with a data-set to work on CloudxLab. Once you are done with the project (it will be reviewed by an expert), you will be awarded a certificate which you can share on LinkedIn.

  • How will be the practicals or hands-on be conducted?

    Enrolment into self-paced course entails 90 days of free access to CloudxLab. Enrolment into instructor-led course entails 90-150 days of free access to Cloudxlab, depending on date of enrollment.

  • I am not from a Java Background. Can I take this course?

    Yes. Java is generally required for understanding MapReduce. MapReduce is a programming paradigm for writing your logic in the form of Mapper and reducer functions. We provide a self-paced course on Java for free. As soon as you signup, it would be available in your account section.

  • What is the refund policy for courses taken from CloudxLab?

    For self-paced course, we provide 100% fees refund if the request is raised within 7 days from enrollment date. Thereafter, no refund is provided.
    For instructor-led course, we provide 100% refund if not more than 1 live session has been conducted -- and we provide 50% refund if 2-4 live sessions have been conducted. If 5 or more live sessions have been conducted, then no refund will be provided.

  • I have some more questions. Can I talk to someone?

    Absolutely! Please contact us at reachus@cloudxlab.com.

Program Leads

Course Instructor
Sandeep GiriCourse Instructor
Course Developer
Abhinav SinghCourse Developer
Course Developer
Benjamin BertincourtCourse Advisor
Course Advisor
Jatin ShahCourse Advisor