Spark Streaming

16 / 20

Previous Index Next

Integrating Apache Spark Streaming & Apache Kafka - Hands-on

Not able to play video? Try with vimeo

The link to the snippet of code is here:

Word Count using Kafka Streaming with Scala and Spark

Word Count using Kafka Streaming with Pyspark - Spark with Python

Previous Index Next

Please login to comment

24 Comments

Moukthika B

a year ago

Hi
I am getting this error when I am trying to clone the bigdata repository. Could you please help me with this issue

Upvote Share

Shubh Tripathi

a year ago

You can refer to https://discuss.cloudxlab.com/t/bad-file-descriptor-while-running-git-clone/8351 for it.

Upvote Share

Suma Aithal

2 years ago

Hi,

There are no streaming related code inside the folder while I tired checkin after the pull command

Upvote Share

Shubh Tripathi

2 years ago

Hi Suma,

Both the repositories above contains the code for spark streaming. You can see the path of the code file by opening the repo in the browser itself and then navigate to the respective path.

Upvote Share

Sachin Ranveer

2 years ago

Hi,

Trying kafka+spark streaming but getting below error.

Upvote Share

Sachin Ranveer

2 years ago

This is my script

import org.apache.spark.sql.{DataFrame, Row, SparkSession}
import org.apache.spark.sql.types.{StructType, StructField, IntegerType, StringType, FloatType, DateType, TimestampType}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.streaming.{ProcessingTime, Trigger}
import java.util.Properties
val spark = SparkSession.builder.master("local[*]").appName("spark_kafka_streaming").getOrCreate()
val df = spark.readStream.format("kafka").option("kafka.bootstrap.servers", "cxln2.c.thelab-240901.internal:6667").option("subscribe", "ranveer_second_topic").load()
df.printSchema()

My build.sbt file

name := "producer_consumer"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.8"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.8" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-sql-kafka-0-10" % "2.4.8" % Test

I have also tried with below command but got the same error

spark-submit --class "KafkaStreaming" target/scala-2.11/producer_consumer_2.11-1.0.jar

Upvote Share

Sachin Ranveer

2 years ago

Any update here ?

Why am I getting above error ?

Upvote Share

This comment has been removed.

Shubh Tripathi

2 years ago

Hi,

Make sure the jar file may contains the required class KafkaStreaming.

Upvote Share

Sachin Ranveer

2 years ago

hi,

It has already found the class. As error is coming at below line

val df = spark.readStream.format("kafka").option("kafka.bootstrap.servers", "cxln2.c.thelab-240901.internal:6667").option("subscribe", "ranveer_second_topic").load()

This is scala file which i used to create jar file

Upvote Share

Sachin Ranveer

2 years ago

Upvote Share

Sachin Ranveer

2 years ago

how do I install kafka in scala ?

Upvote Share

Sachin Ranveer

2 years ago

name := "producer_consumer"

version := "1.0"

scalaVersion := "2.11.8"

libraryDependencies += "org.apache.kafka" %% "kafka" % "2.3.1"

Upvote Share

Shubh Tripathi

2 years ago

You can refer to https://github.com/cloudxlab/bigdata/tree/master/spark/examples/streaming/word_count_kafka for it

Upvote Share

Sachin Ranveer

2 years ago

I am referring that only this is my build.sbt file

name := "producer_consumer"

version := "1.0"

scalaVersion := "2.11.8"

libraryDependencies += "org.apache.kafka" %% "kafka" % "2.3.1"

I am getting below error

[ranveersachin1433053@cxln5 new_test]$ vi build.sbt[ranveersachin1433053@cxln5 new_test]$ sbt console[info] Updated file /home/ranveersachin1433053/new_test/project/build.properties: set sbt.version to 1.2.8[info] Loading project definition from /home/ranveersachin1433053/new_test/project[info] Updating ProjectRef(uri("file:/home/ranveersachin1433053/new_test/project/"), "new_test-build")...Waiting for lock on /home/ranveersachin1433053/.ivy2/.sbt.ivy.lock to be available...

For simple kafka installation I have to follow up number of times. Why cant you guys preinstall it. I want below statement to work in scala

import org.apache.kafka.clients.proudcer.{KafkaProducer, ProducerConfig, ProducerRecord}

Upvote Share

Shubh Tripathi

2 years ago

Hi,

As the error suggests, the file is locked by some other process. The ".sbt.ivy.lock" file is a lock file used to ensure that only one process at a time can modify the Ivy cache.

You can try waiting for a few moments and then try again. If the problem persists, you can try terminating any other processes that may be using the file, or you can manually delete the lock file to release the lock. However, be careful when deleting lock files as it can potentially corrupt your project.

Upvote Share

Shubh Tripathi

2 years ago

The steps to use Kafka in scala are same as mentioned in https://github.com/cloudxlab/bigdata/tree/master/spark/examples/streaming/word_count_kafka

Upvote Share

Sachin Ranveer

2 years ago

It is not working I have tried with 1 day gap. I have pasted my build.sbt file content below. Let me know if you have successfully installed kafka or not

name := "producer_consumer"

version := "1.0"

scalaVersion := "2.11.8"

libraryDependencies += "org.apache.kafka" %% "kafka" % "2.3.1"

Upvote Share

Sachin Ranveer

2 years ago

why we have to install kafka why cant you preinstall kafka like spark ?

Upvote Share

This comment has been removed.

Shubh Tripathi

2 years ago

Hi Sachin,

Kafka is pre-installed in our lab. You can check it at https://cloudxlab.com/assessment/displayslide/6682/kafka. But it is a standalone installation that is not associated with your Scala project.

That's why while building Scala project, You need to add Kafka's dependency.

I deleted the lock files, and cleaned up little of your project. After that it was working fine.