Apache Spark Basics

You are currently auditing this course.
38 / 89

Apache Spark - Wordcount with spark-shell (scala spark shell)

In this exercise, we are going to learning how to perform wordcount using spark.

Step 1: Start the spark shell using following command and wait for prompt to appear

spark-shell

Step 2: Create RDD from a file in HDFS, type the following on spark-shell and press enter:

var linesRDD = sc.textFile("/data/mr/wordcount/input/big.txt")

Step 3: Convert each record into word

var wordsRDD = linesRDD.flatMap(_.split(" "))

Step 3: Convert each word into key-value pair

var wordsKvRdd = wordsRDD.map((_, 1))

Step 3: Group By key and perform aggregation on each key:

var wordCounts = wordsKvRdd.reduceByKey(_ + _ )

Step 3: Save the results into HDFS:

wordCounts.saveAsTextFile("my_spark_shell_wc_output")

No hints are availble for this assesment

Answer is not availble for this assesment

Loading comments...