Machine Learning Specialization by CloudxLab for $99 | Expires in

 Enroll Now
E&ICT

E&ICT Academy, IIT Roorkee

An initiative of Ministry of Electronics and Information Technology (MeitY) Govt. of India

Learn Big Data Engineering with Hadoop and Spark & Get Certificate from E&ICT, IIT Roorkee

Learn HDFS, ZooKeeper, Hive, HBase, NoSQL, Oozie, Flume, Sqoop, Spark, Spark RDD, Spark Streaming, Kafka, SparkR, SparkSQL, MLlib, and GraphX From Industry Experts

5,927 Ratings        21,000+ learners

  60+ Hours of Online Self-paced Training

  90 days of Lab

  24x7 Support

8 Projects

  Compatible with Hortonworks, Cloudera Certifications

About E&ICT, IIT Roorkee

The Electronics & ICT Academy program is sponsored by the Ministry of Electronics and Information Technology, Govt. of India.

The E&ICT Academy IIT Roorkee conducts short courses/FDPs in the emerging areas to enrich & upgrade subject knowledge and technical skills benefiting faculty, working professionals and Govt. employees.

The trained beneficiaries are expected to create a cascading effect, transforming the competencies and standards in the parent institutes/organizations.


About the Specialization

E & ICT Academy IIT Roorkee supported by Ministry of Electronics and Information Technology ( MeitY) with CloudxLab as industry partner, is conducting a training program in Big Data with Hadoop and Spark.

In this specialization, you will learn Hadoop and Spark to drive better business decisions and solve real-world problems.

There is massive growth in the big data space, and job opportunities are skyrocketing, making this the perfect time to launch your career in this space.

The E&ICT courses lay special emphasis on hands-on learning with participation from industry experts. These programs also enable the participants and institutes to build industry connects, upgrade lab facilities and create opportunities for collaboration.

E&ICT courses are recognized by All India Council for Technical Education(AICTE) at par with QIP for recognition/credits.

As of now the E&ICT Academy, IIT Roorkee has conducted 74 courses and trained over 4,000 beneficiaries. Learn More



Eligibility
  1. Anyone across the world who wants to learn Big Data Engineering (OR)
  2. Faculties and students from colleges in India - You're required to submit your college ID card along with aadhaar number as proof of identification. Course access will be given only after submitting the required documents


Program Highlights

2 courses

1. Big Data with Hadoop
2. Big Data with Spark

Cloud Lab

Apply the skills you learn on a distributed cluster to solve real-world problems.

Projects

Work on 8 big data projects to get hands-on experience

Best-in-class Support

24×7 support.
Discussion forum to answer all your queries throughout your learning journey.

Certificate

Highlight your new skills on your resume or LinkedIn. Certificate issued by E&ICT, IIT Roorkee.

Certifications

Compatible with: CCP Data Engineer, CCA Spark and Hadoop Developer, HDP Certified Developer, HDP Certified Developer: Spark
Enrollment

Self-Paced Learning

For Everyone (Course + Lab)
Start Immediately
90 days lab

For Indian Students & Faculties (Course + Lab)
Start Immediately
90 days lab

Learning Path
Download Course Syllabus

Course 1

Big Data with Hadoop

This is the first course in the specialization. In this course, we start with Big Data introduction and then we dive into Big Data ecosystem tools and technologies like ZooKeeper, HDFS, YARN, MapReduce, Pig, Hive, HBase, NoSQL, Sqoop, Flume, Oozie.

Each topic consists of high-quality videos, slides, hands-on assessments, quizzes and case studies to make learning effective and for life. With this course, you also get access to real-world production lab so that you will learn by doing.

1.1 Big Data Introduction
1.2 Distributed systems
1.3 Big Data Use Cases
1.4 Various Solutions
1.5 Overview of Hadoop Ecosystem
1.6 Spark Ecosystem Walkthrough
1.7 Quiz
2.1 Understanding the CloudxLab
2.2 Getting Started - Hands on
2.3 Hadoop & Spark Hands-on
2.4 Quiz and Assessment
2.5 Basics of Linux - Quick Hands-On
2.6 Understanding Regular Expressions
2.7 Quiz and Assessment
2.8 Setting up VM (optional)
3.1 ZooKeeper - Race Condition
3.2 ZooKeeper - Deadlock
3.3 Hands-On
3.4 Quiz & Assessment
3.5 How does election happen - Paxos Algorithm?
3.6 Use cases
3.7 When not to use
3.8 Quiz & Assessment
4.1 Why HDFS or Why not existing file systems?
4.2 HDFS - NameNode & DataNodes
4.3 Quiz
4.4 Advance HDFS Concepts (HA, Federation)
4.5 Quiz
4.6 Hands-on with HDFS (Upload, Download, SetRep)
4.7 Quiz & Assessment
4.8 Data Locality (Rack Awareness)
5.1 YARN - Why not existing tools?
5.2 YARN - Evolution from MapReduce 1.0
5.3 Resource Management: YARN Architecture
5.4 Advance Concepts - Speculative Execution
5.5 Quiz
6.1 MapReduce - Understanding Sorting
6.2 MapReduce - Overview
6.3 Quiz
6.4 Example 0 - Word Frequency Problem - Without MR
6.5 Example 1 - Only Mapper - Image Resizing
6.6 Example 2 - Word Frequency Problem
6.7 Example 3 - Temperature Problem
6.8 Example 4 - Multiple Reducer
6.9 Example 5 - Java MapReduce Walkthrough
6.10 Quiz
7.1 Writing MapReduce Code Using Java
7.2 Building MapReduce project using Apache Ant
7.3 Concept - Associative & Commutative
7.4 Quiz
7.5 Example 8 - Combiner
7.6 Example 9 - Hadoop Streaming
7.7 Example 10 - Adv. Problem Solving - Anagrams
7.8 Example 11 - Adv. Problem Solving - Same DNA
7.9 Example 12 - Adv. Problem Solving - Similar DNA
7.10 Example 12 - Joins - Voting
7.11 Limitations of MapReduce
7.12 Quiz
8.1 Pig - Introduction
8.2 Pig - Modes
8.3 Getting Started
8.4 Example - NYSE Stock Exchange
8.5 Concept - Lazy Evaluation
9.1 Hive - Introduction
9.2 Hive - Data Types
9.3 Getting Started
9.4 Loading Data in Hive (Tables)
9.5 Example: Movielens Data Processing
9.6 Advance Concepts: Views
9.7 Connecting Tableau and HiveServer 2
9.8 Connecting Microsoft Excel and HiveServer 2
9.9 Project: Sentiment Analyses of Twitter Data
9.10 Advanced - Partition Tables
9.11 Understanding HCatalog & Impala
9.12 Quiz
10.1 NoSQL - Scaling Out / Up
10.2 NoSQL - ACID Properties and RDBMS Story
10.3 CAP Theorem
10.4 HBase Architecture - Region Servers etc
10.5 Hbase Data Model - Column Family Orientedness
10.6 Getting Started - Create table, Adding Data
10.7 Adv Example - Google Links Storage
10.8 Concept - Bloom Filter
10.9 Comparison of NOSQL Databases
10.10 Quiz
11.1 Sqoop - Introduction
11.2 Sqoop Import - MySQL to HDFS
11.3 Exporting to MySQL from HDFS
11.4 Concept - Unbounding Dataset Processing or Stream Processing
11.5 Flume Overview: Agents - Source, Sink, Channel
11.6 Example 1 - Data from Local network service into HDFS
11.7 Example 2 - Extracting Twitter Data
11.8 Quiz
11.9 Example 3 - Creating workflow with Oozie

Course 2

Big Data with Spark

This is the second course in the specialization. In this course, we start with Big Data and Spark introduction and then we dive into Scala and Spark concepts like RDD, transformations, actions, persistence and deploying Spark applications. We then cover Spark Streaming, Kafka, various data formats like JSON, XML, Avro, Parquet and Protocol Buffers. We conclude the course with very important topics such as Dataframes, SparkSQL, SparkR, MLlib and GraphX.

Each topic consists of high-quality videos, slides, hands-on assessments, quizzes and case studies to make learning effective and for life. With this course, you also get access to real-world production lab so that you will learn by doing.

1.1 Apache Spark ecosystem walkthrough
1.2 Spark Introduction - Why Spark?
1.3 Quiz
2.1 Scala - Quick Introduction - Access Scala on CloudxLab
2.2 Scala - Quick Introduction - Variables and Methods
2.3 Getting Started: Interactive, Compilation, SBT
2.4 Types, Variables & Values
2.5 Functions
2.6 Collections
2.7 Classes
2.8 Parameters
2.9 More Features
2.10 Quiz and Assessment
3.1 Apache Spark ecosystem walkthrough
3.2 Spark Introduction - Why Spark?
3.3 Using the Spark Shell on CloudxLab
3.4 Example 1 - Performing Word Count
3.5 Understanding Spark Cluster Modes on YARN
3.6 RDDs (Resilient Distributed Datasets)
3.7 General RDD Operations: Transformations & Actions
3.8 RDD lineage
3.9 RDD Persistence Overview
3.10 Distributed Persistence
4.1 Creating the SparkContext
4.2 Building a Spark Application (Scala, Java, Python)
4.3 The Spark Application Web UI
4.4 Configuring Spark Properties
4.5 Running Spark on Cluster
4.6 RDD Partitions
4.7 Executing Parallel Operations
4.8 Stages and Tasks
5.1 Common Spark Use Cases
5.2 Example 1 - Data Cleaning (Movielens)
5.3 Example 2 - Understanding Spark Streaming
5.4 Understanding Kafka
5.5 Example 3 - Spark Streaming from Kafka
5.6 Iterative Algorithms in Spark
5.7 Project: Real-time analytics of orders in an e-commerce company
6.1 InputFormat and InputSplit
6.2 JSON
6.3 XML
6.4 AVRO
6.5 How to store many small files - SequenceFile?
6.6 Parquet
6.7 Protocol Buffers
6.8 Comparing Compressions
6.9 Understanding Row Oriented and Column Oriented Formats - RCFile?
7.1 Spark SQL - Introduction
7.2 Spark SQL - Dataframe Introduction
7.3 Transforming and Querying DataFrames
7.4 Saving DataFrames
7.5 DataFrames and RDDs
7.6 Comparing Spark SQL, Impala, and Hive-on-Spark
8.1 Machine Learning Introduction
8.2 Applications Of Machine Learning
8.3 MlLib Example: k-means
8.4 SparkR Example
Projects

Projects

1. Sentiment analysis of "Iron Man 3" movie using Hive and visualizing the sentiment data using BI tools such as Tableau


2. Process the NSE (National Stock Exchange) data using Hive for various insights


3. Analyze MovieLens data using Hive


4. Generate movie recommendations using Spark MLlib


5. Derive the importance of various handles at Twitter using Spark GraphX


6. Churn the logs of NASA Kennedy Space Center WWW server using Spark to find out useful business and devops metrics


7. Write end-to-end Spark application starting from writing code on your local machine to deploying to the cluster


8. Build real-time analytics dashboard for an e-commerce company using Apache Spark, Kafka, Spark Streaming, Node.js, Socket.IO and Highcharts

Certificate

Certificate

Earn your certificate

Our Specialization is exhaustive and the certificate rewarded by us is proof that you have taken a big leap in Big Data domain.


Differentiate yourself

The knowledge you have gained from working on projects, videos, quizzes, hands-on assessments and case studies gives you a competitive edge.


Share your achievement

Highlight your new skills on your resume, LinkedIn, Facebook and Twitter. Tell your friends and colleagues about it.

 Course Certificate Sample
Course Creators
Sandeep Giri

Sandeep Giri

Founder at CloudxLab
Past: Amazon, InMobi, D.E.Shaw
Course Developer
Sandeep Giri

Sanjeev Manhas

Associate Professor,
IIT Roorkee
Course Advisor
Sandeep Giri

R. Balasubramanian

Professor,
IIT Roorkee
Course Developer
Sandeep Giri

Partha Pratim Roy

Assistant Professor,
IIT Roorkee
Course Developer
Abhinav Singh

Abhinav Singh

Co-Founder at CloudxLab
Past: Byjus
Course Developer
 Jatin Shah

Jatin Shah

Ex-LinkedIn, Yahoo, Yale CS Ph.D.
IIT-B
Course Advisor

Reviews

(4.9 out of 5)
...

I have started learning 3 months ago and I really gained much info and practical experience. I completed the “Big Data with Spark” course and the learning journey really exceeded my expectations.

The course structure and topics were great, well organized and comprehensive, even the basics of Linux were covered in a very simple way. There were always exercises and hands-on that build better understanding, also the lab environment and provided online tools were great help and let you practice everything without having to install anything on your PC except the web browser.

In addition, for the live sessions, it was really a joy attending them each weekend, our instructor “Sandeep Giri”, besides his great experience and knowledge, he was generous, helpful and patient answering all attendees questions in such a way that he could go for more examples and hands-on or even searching the documentation and try new things, I gained much from other attendees’ questions and the way Sandeep responded to them.

This was a great experience having this course and I’m going for more courses in Big Data and Machine Learning with CloudxLab and I recommend it for all my friends and colleagues who look for better learning.

...

Must have for practicing and perfecting hadoop. To setup in PC you need to have a very high end configuration and setup will be pseudo node setup.. For better understanding I recomend CloudxLab

...

This course is suitable for everyone. Me being a product manager had not done hands-on coding since quite some time. Python was completely new to me. However, Sandeep Giri gave us a crash course to Python and then introduced us to Machine Learning. Also, the CloudxLab’s environment was very useful to just log in and start practising coding and playing with things learnt. A good mix of theory and practical exercises and specifically the sequence of starting straight away with a project and then going deeper was a very good way of teaching. I would recommend this course to all.

...

They are great. They take care of all the Big Data technologies (Hadoop, Spark, Hive, etc.) so you do not have to worry about installing and running them correclty on your pc. Plus, they have a fantastic customer support. Even when I have had problems debugging my own programs, they have answered me with the correct solution in a few hours, and all of this for a more than reasonable price. I personally recommend it to everyone :)

...

Machine learning courses in especially the Artificial Intelligence for the manager course is excellent in CloudxLab. I have attended some of the course and able to understand as Sandeep Giri sir has taught AI course from scratch and related to our data to day life…

He even takes free sessions to helps students and provides career guidance.

His courses are worthy and even just by watching YouTube video anyone can easily crack the AI interview.

FAQ

You need to complete at least 60% topics from the course and submit required projects within 90 days. There are total 8 projects among which submission of 2 projects is mandatory - Sentiment Analysis (Hive) and Log Parsing (Spark).

Please mail your required projects at reachus@cloudxlab.com. After submitting the projects, our course experts will review it and forward your details to EICT, IIT Roorkee. EICT will be issuing your certificate, it may take 3-7 days.

In Self-paced learning, you will get,

  • Lifetime access to the self-paced course including videos, assessments, quizzes, and projects
  • Recordings of the previous batch of instructor-led training
  • 24x7 support using the discussion forum
  1. Basics Of SQL. You should know the basics of SQL and databases. If you know about filters in SQL, you are expected to understand the course.

  2. A know-how of the basics of programming. If you understand 'loops' in any programming language, and if you are able to create a directory and see what's inside a file from the command line, you are good to get the concepts of this course even if you have not really touched programming for the last 10 years! In addition, we will be providing video classes on the basics of Python and Scala.

The instructors for this course are industry experts having years of experience in mentoring students across the world.

It will take 2-3 months with 6-8 hours of effort per week.

We understand that you might need course material for a longer duration to make most out of your subscription. You will get lifetime access (Till the company is operational) to the course material so that you can refer to the course material anytime.

The E&ICT Academy project is sponsored by the Ministry of Electronics and Information Technology, Govt. of India. The E&ICT courses lay special emphasis on hands-on learning with participation from industry in the emerging areas of the E&ICT domain. Their programs enable the participants and institutes to build industry connects, upgrade lab facilities and create opportunities for collaboration For this course, CloudxLab has joined hands with E&ICT, IIT Roorkee in order to fulfill the vision of The Ministry of Electronics and Information Technology (MeitY), to upskill the faculties across the country on high-end technologies. Hence it is available at such low prices so that every eligible person can pursue it and benefit from it wrt his career growth and country's growth as well.

Please log in at CloudxLab.com with your Gmail Id and access your course under "My Courses".

Yes. Java is generally required for understanding MapReduce. MapReduce is a programming paradigm for writing your logic in the form of Mapper and reducer functions. We provide a self-paced course on Java for free. As soon as you signup, it would be available in your account section.

Course requires a good internet (1 Mbps or more) and a browser to watch videos and do hands-on the lab. We've configured all the tools in the lab so that you can focus on learning and practicing in a real-world cluster.

These days most of the companies are using Big Data systems like Hive and Spark for analytics. After the course, you will have a good knowledge of mining data using Hive and SparkSQL. Also, you will learn how to run your "R" code in SparkR.

For self-paced course, we provide 100% fees refund if the request is raised within 7 days from enrollment date. Thereafter, no refund is provided.

For instructor-led course, we provide 100% refund if not more than 1 live session has been conducted -- and we provide 50% refund if 2-4 live sessions have been conducted. If 5 or more live sessions have been conducted, then no refund will be provided.

No, the lab is available within the course price.

Yes, you can renew your subscription anytime. Please choose your desired plan for the lab and make payment to renew your subscription.

In Big Data with Hadoop and Spark specialization you will learn Hadoop and Spark to drive better business decisions and solve real-world problems. It will add value to your research but you will have to deep dive into the topic of your research on your own.

Related Courses
Machine Learning Specialization - EICT, IITR
Big Data with Spark
Python For Machine Learning - EICT, IITR
Machine Learning Specialization
Contact Details

Electronics and ICT Academy

Electronics & ICT Academy
Electronics & Communication Department
Indian Institute of technology
Roorkee 247667, Uttarakhand INDIA

CloudxLab

#215, The Arcade, Brigade Metropolis,
Mahadevpura, Bangalore
India - 560048
Phone: 080 - 4920 2224

Have more questions? Please contact us at reachus@cloudxlab.com