Showing results for ;
  • Topic
    6 Concepts | 3 Questions | 4 Assessments | 4,659 Learners

    MapReduce is Framework as well as a paradigm of computing. By the way of map-reduce, we are able to break-down complex computation into distributed computing.

    As part of this chapter, we are going to learning how to build MapReduce programmes using Java.

    Please make sure you work along with the course instead of just sitting back and watching.

    Happy Learning!

    Instructor: Sandeep Giri
  • Topic
    8 Concepts | 4,454 Learners

    Learn to load and save data using Spark, compression, and how to handle various file formats using Spark from the industry experts.

    Instructor: Sandeep Giri
  • Topic
    1 Concept | 4 Questions | 4,396 Learners

    Whenever you make a request to a web server for a page, it records it in a file which is called logs.

    The logs of a webserver are the gold mines for gaining insights in the user behaviour. Every data scientists usually look at the logs first to understand the behaviour of the users. But since the logs are humongous in size, it takes a distributed framework like Hadoop or Spark to process it.

    As part of this project, you will learn to parse the text data stored in logs of a web server using the Apache Spark.

  • Topic
    7 Concepts | 3,167 Learners

    Learn how to apply MLLib for Machine Learning using Spark.

    Instructor: Sandeep Giri
  • Topic
    2 Concepts | 3,071 Learners

    Learn Graph Processing with Spark from Industry Experts.

    Instructor: Sandeep Giri
  • In this project, we will learn how to build a real-time analytics dashboard using Apache Spark Streaming, Kafka, Node.js, Socket.IO, and Highcharts.

    Instructor: Abhinav Singh
  • There are many Big Data Solution stacks.

    The first and most powerful stack is Apache Hadoop and Spark together. While Hadoop provides storage for structured and unstructured data, Spark provides the computational capability on top of Hadoop.

    The second way could be to use Cassandra or MongoDB. The third could be to use Google Compute Engine or Microsoft Azure. In such cases, you would have to upload your data to Google or Microsoft which may not be acceptable to your organization sometimes.

    In this post, we will understand the basics of:

    • Apache Hadoop
    • components of the Hadoop ecosystem
    • overview of …