Spark Streaming

You are currently auditing this course.
8 / 20

Apache Spark - Streaming - Use Cases




Not able to play video? Try with youtube

[ Spark Streaming - Use Case - Ecommerce]

Before going deep into Spark Streaming, Let's understand the scenarios in which Spark streaming can be useful.

Let's say an e-commerce company, wants to build a real-time analytics dashboard to optimize its inventory and operations. This dashboard has information of how many products are getting purchased, shipped and delivered every minute.

How do we build a real-time dashboard with Spark Streaming? Let's discuss how spark streaming helps in building pipeline for real-time analytics dashboard

As soon as product status changes, order management system pushes product id and product status to Kafka. We'll discuss Kafka later in the course. Spark streaming reads data from the Kafka. Each row of the input stream contains product id and its current status. As you can see, the current status of order id 1782 is "purchased" and the current status of order id 1723 is "shipped". Spark streaming creates one-minute batches from this input stream. Then, Spark engine processes each one minute batch and generates the stream of output for each batch. As you can see, the final output of each one-minute batch contains the data on the number of products purchased, shipped and delivered in that batch.

Spark streaming program runs forever until it is stopped manually or it encounters some error. You can think of Spark Streaming program as a daemon. A daemon is a process which runs forever in the background. Spark Streaming program keeps on reading the stream of input data, creates batches of input data as per the specified batch interval and generates the stream of output for each batch.

You will do the project on building a real-time analytics dashboard later in the course.

[ Spark Streaming - Use Case Real-time Sentiment Analysis]

Sentiment analysis is one of the hot topics of recent times. It's really important for a company to know if users are satisfied with its product and services or not. How do we find out the sentiment of users every fifteen minutes by analyzing data from various sources such as facebook, twitter, users' feedback, comments, and reviews?

Let's discuss how spark streaming helps in doing real-time sentiment analysis.

Spark streaming can receive data from many sources at the same time. As displayed, Spark Streaming receives data from facebook, twitter and user's reviews submitted on the website.

Spark streaming creates 15-minute batches from this input data. Then, Spark engine processes each 15-minute batch and analyzes the sentiment of users. For analyzing sentiment we can use various libraries like Stanford coreNLP.

[Spark Streaming - Use Case - Real-time Fraud Detection]

Let's discuss one more use case of Spark Streaming. How do we build a real-time fraud detection system for a bank to find out the fraudulent transactions? If we can build such a system, banks can take appropriate actions as soon as a transaction gets labeled as a fraudulent transaction. To build such a system we use machine learning to train a fraud detection model. We can use Spark MLlib to train a model to detect frauds.

Let's discuss how spark streaming helps in building real-time fraud detection system.

As you can see, Spark Streaming receives streams of bank transactions as input. Spark Streaming creates one-minute batches from this input data. The Spark engine processes each one minute batch and figures out the fraudulent transactions using already trained fraud detection model.

[Spark Streaming - Use Cases - More Examples]

Uber uses Spark Streaming for real-time telemetry analytics by collecting data from its mobile users.

Pinterest uses Spark Streaming to provide immediate insight into how users are engaging with pins across the globe in real-time

Netflix uses Spark Streaming to provide movie recommendations to its users

As you can see from above scenarios, spark streaming continuously receives input data stream and uses spark engine to process it to generate output data stream.


Loading comments...