In this blog post, we will learn how to build a real-time analytics dashboard using Apache Spark streaming, Kafka, Node.js, Socket.IO and Highcharts.
An e-commerce portal (http://www.aaaa.com) wants to build a real-time analytics dashboard to visualize the number of orders getting shipped every minute to improve the performance of their logistics.
Before working on the solution, let’s take a quick look at all the tools we will be using:
Apache Spark – A fast and general engine for large-scale data processing. It is 100 times faster than Hadoop MapReduce in memory and 10x faster on disk. Learn more about Apache Spark here
Python – Python is a widely used high-level, general-purpose, interpreted, dynamic programming language. Learn more about Python here
Kafka – A high-throughput, distributed, publish-subscribe messaging system. Learn more about Kafka here
CloudxLab – Provides a real cloud-based environment for practicing and learn various tools. You can start practicing right away by just signing up online.
How To Build A Data Pipeline?
Below is the high-level architecture of the data pipeline
Our real-time analytics dashboard will look like this