12 Weeks

Live Training

10+

Guided Projects

100 Days

Cloud Lab Access

Placement

Assistance

CloudxLab

Certificate

Course Overview

Home All Courses Certification Course in Foundations of Data Science by CloudxLab

Certification Course in Foundations of Data Science by CloudxLab

In an era where data drives our daily lives, the importance of harnessing its potential cannot be overstated. According to IBM, the data on our planet doubles every two years. The true value of this data emerges when we can unveil the patterns and trends hidden within.

Our "Foundations of Data Science" course is your gateway to comprehending and harnessing this data-driven world. This comprehensive introduction to data analysis using Python equips you with the tools to explore, analyze, and visualize data. With the mastery of Python libraries like Pandas, Numpy, and Matplotlib, you'll learn to clean, manipulate, and transform data into actionable insights.

Whether you're a beginner or looking to solidify your data science knowledge, this course provides the foundational tools and techniques to succeed in the data-driven world

(4.75K) 75K+ Learners
8+ Projects 100 Days Cloud Lab Access
Estimated 11.5M new Data Science jobs(US)
Avg. Salary of over $84000 in Data Science roles
High demands in Tech, Finance, E-Commerce, Healthcare
Highly transferable mainstream skills

Program Highlights

Key Highlights

12+ Weeks of Live Instructor Led Learning
8+ Projects
Placement Assistance
100 Days of Lab Access
10+ Tools Covered
Doubt Clearing all weekdays
Email & Discussion Forum Support
Lifetime Access to Course Material
Certificate From CloudxLab
Hands-on Practice in Cloud Labs

Book Counselling Session

Submit

Next Batch Starting on 31st March 2024

What is the certificate like?

  • About Cloudxlab

    Cloudxlab is a team of developers, researchers, and educators who build innovative products and create enriching learning experiences for users. Cloudxlab upskills engineers in deep tech to make them employable & future-ready.

    Our courses and certifications are well recognized in the industry. With our expert instructors, excellent course content, real-time projects from the industry, and versatile gamified learning environment, we aim to give you the best in technology education and propel your career

Hands-on Learning

hands-on lab
  • Gamified Learning Platform
    Making learning fun and sustainable

  • Auto-assessment Tests
    Learn by writing code and executing it on lab

  • No Installation Required
    Lab comes pre-installed softwares and accessible everywhere

  • Accessibility
    Access the lab anywhere, anytime with an internet connection

Mentors / Faculty

Instructor Sandeep Giri

Sandeep Giri

Founder at CloudxLab

Past: Amazon, InMobi, D.E.Shaw

Instructor Praveen

Praveen Pavithran

Co-Founder at Yatis

Past: YourCabs, Cypress Semiconductor

Instructor Jatin Shah

Sandeep Akode

Senior Software Engineer at CloudxLab

IIT Kanpur

Instructor Sachin Giri

Sachin Giri

Software Engineer at CloudxLab

Instructor Shubh Tripathi

Shubh Tripathi

ML Engineer at ClodudxLab

Instructor Gajender Singh

Gajender Singh

Instructor at CloudxLab

Curriculum

Python for Data Science

1. Programming Tools and Foundational Concepts
1. Introduction to Linux
2. Introduction to Python
3. Hands-on using Jupyter on CloudxLab
4. Overview of Linear Algebra
5. Introduction to NumPy and Pandas

Course on Big Data with Hadoop

1. Introduction
1. Big Data Introduction
2. Distributed systems
3. Big Data Use Cases
4. Various Solutions
5. Overview of Hadoop Ecosystem
6. Spark Ecosystem Walkthrough
7. Quiz
2. Foundation & Environment
1. Understanding the CloudxLab
2. Getting Started - Hands on
3. Hadoop & Spark Hands-on
4. Quiz and Assessment
5. Basics of Linux - Quick Hands-On
6. Understanding Regular Expressions
7. Quiz and Assessment
8. Setting up VM (optional)
3. Zookeeper
1. ZooKeeper - Race Condition
2. ZooKeeper - Deadlock
3. Hands-On
4. Quiz & Assessment
5. How does election happen - Paxos Algorithm?
6. Use cases
7. When not to use
8. Quiz & Assessment
4. HDFS
1. Why HDFS or Why not existing file systems?
2. HDFS - NameNode & DataNodes
3. Quiz
4. Advance HDFS Concepts (HA, Federation)
5. Quiz
6. Hands-on with HDFS (Upload, Download, SetRep)
7. Quiz & Assessment
8. Data Locality (Rack Awareness)
5. YARN
1. YARN - Why not existing tools?
2. YARN - Evolution from MapReduce 1.0
3. Resource Management: YARN Architecture
4. Advance Concepts - Speculative Execution
5. Quiz
6. MapReduce Basics
1. MapReduce - Understanding Sorting
2. MapReduce - Overview
3. Quiz
4. Example 0 - Word Frequency Problem - Without MR
5. Example 1 - Only Mapper - Image Resizing
6. Example 2 - Word Frequency Problem
7. Example 3 - Temperature Problem
8. Example 4 - Multiple Reducer
9. Example 5 - Java MapReduce Walkthrough
10. Quiz
7. MapReduce Advanced
1. Writing MapReduce Code Using Java
2. Building MapReduce project using Apache Ant
3. Concept - Associative & Commutative
4. Quiz
5. Example 8 - Combiner
6. Example 9 - Hadoop Streaming
7. Example 10 - Adv. Problem Solving - Anagrams
8. Example 11 - Adv. Problem Solving - Same DNA
9. Example 12 - Adv. Problem Solving - Similar DNA
10. Example 13 - Joins - Voting
11. Limitations of MapReduce
12. Quiz
8. Analyzing Data with Pig
1. Pig - Introduction
2. Pig - Modes
3. Getting Started
4. Example - NYSE Stock Exchange
5. Concept - Lazy Evaluation
9. Processing Data with Hive
1. Hive - Introduction
2. Data Types
3. Getting Started
4. Loading Data in Hive (Tables)
5. Example: Movielens Data Processing
6. Advance Concepts: Views
7. Connecting Tableau and HiveServer 2
8. Connecting Microsoft Excel and HiveServer 2
9. Project: Sentiment Analyses of Twitter Data
10. Advanced - Partition Tables
11. Understanding HCatalog & Impala
12. Quiz
10. NoSQL and HBase
1. NoSQL - Scaling Out / Up
2. NoSQL - ACID Properties and RDBMS Story
3. CAP Theorem
4. HBase Architecture - Region Servers etc
5. Hbase Data Model - Column Family Orientedness
6. Getting Started - Create table, Adding Data
7. Adv Example - Google Links Storage
8. Concept - Bloom Filter
9. Comparison of NOSQL Databases
10. Quiz
11. Importing Data with Sqoop and Flume, Oozie
1. Sqoop - Introduction
2. Sqoop Import - MySQL to HDFS
3. Exporting to MySQL from HDFS
4. Concept - Unbounding Dataset Processing or Stream Processing
5. Flume Overview: Agents - Source, Sink, Channel
6. Example 1 - Data from Local network service into HDFS
7. Example 2 - Extracting Twitter Data
8. Quiz
9. Example 3 - Creating workflow with Oozie

Course on Big Data with Spark

1. Introduction
1.1 Apache Spark ecosystem walkthrough
1.2 Spark Introduction - Why Spark?
1.3 Quiz
2. Scala Basics
2.1 Scala - Quick Introduction - Access Scala on CloudxLab
2.2 Scala - Quick Introduction - Variables and Methods
2.3 Getting Started: Interactive, Compilation, SBT
2.4 Types, Variables & Values
2.5 Functions
2.6 Collections
2.7 Classes
2.8 Parameters
2.9 More Features
2.10 Quiz and Assessment
3. Spark Basics
3.1 Apache Spark ecosystem walkthrough
3.2 Spark Introduction - Why Spark?
3.3 Using the Spark Shell on CloudxLab
3.4 Example 1 - Performing Word Count
3.5 Understanding Spark Cluster Modes on YARN
3.6 RDDs (Resilient Distributed Datasets)
3.7 General RDD Operations: Transformations & Actions
3.8 RDD lineage
3.9 RDD Persistence Overview
3.10 Distributed Persistence
4. Writing and Deploying Spark Applications
4.1 Creating the SparkContext
4.2 Building a Spark Application (Scala, Java, Python)
4.3 The Spark Application Web UI
4.4 Configuring Spark Properties
4.5 Running Spark on Cluster
4.6 RDD Partitions
4.7 Executing Parallel Operations
4.8 Stages and Tasks
5. Common Patterns in Spark Data Processing
5.1 Common Spark Use Cases
5.2 Example 1 - Data Cleaning (Movielens)
5.3 Example 2 - Understanding Spark Streaming
5.4 Understanding Kafka
5.5 Example 3 - Spark Streaming from Kafka
5.6 Iterative Algorithms in Spark
5.7 Project: Real-time analytics of orders in an e-commerce company
6. Data Formats & Management
6.1 InputFormat and InputSplit
6.2 JSON
6.3 XML
6.4 AVRO
6.5 How to store many small files - SequenceFile?
6.6 Parquet
6.7 Protocol Buffers
6.8 Comparing Compressions
6.9 Understanding Row Oriented and Column Oriented Formats - RCFile?
7. DataFrames and Spark SQL
7.1 Spark SQL - Introduction
7.2 Spark SQL - Dataframe Introduction
7.3 Transforming and Querying DataFrames
7.4 Saving DataFrames
7.5 DataFrames and RDDs
7.6 Comparing Spark SQL, Impala, and Hive-on-Spark
8. Machine Learning with Spark
8.1 Machine Learning Introduction
8.2 Applications Of Machine Learning
8.3 MlLib Example: k-means
8.4 SparkR Example
12+
Weeks of Online Live Classes
100
Days of Lab Access
8+
Projects
75K+
Learners

Projects

Enroll Now

Starting at  249/month

VIEW ALL EMI PLANS

Program Fee: 999

Batch 2

(9 Sept 2023)

Admission Closed

Batch 3

(31st March 2024)

Enroll Now

Placement Assistance

By CloudxLab

Placement Eligibility Test

Placement Eligibility Test

We have around 300+ recruitment partners who will be interviewing you based on your performances in PET

Dedicated Job Portal

Dedicated Job Portal

Opportunities from companies who approach us asking for our learner profiles will be posted on our job portal to providevisibility to your profile

Career Guidance Webinars

Career Guidance Webinars

Career Guidance Webinars from seasoned industry experts

Testimonials

​