Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left
Apply NowLogin using Social Account
     Continue with GoogleLogin using your credentials
In this exercise, we are going to learning how to perform wordcount using spark.
Step 1: Start the spark shell using following command and wait for prompt to appear
spark-shell
Step 2: Create RDD from a file in HDFS, type the following on spark-shell and press enter:
var linesRDD = sc.textFile("/data/mr/wordcount/input/big.txt")
Step 3: Convert each record into word
var wordsRDD = linesRDD.flatMap(_.split(" "))
Step 3: Convert each word into key-value pair
var wordsKvRdd = wordsRDD.map((_, 1))
Step 3: Group By key and perform aggregation on each key:
var wordCounts = wordsKvRdd.reduceByKey(_ + _ )
Step 3: Save the results into HDFS:
wordCounts.saveAsTextFile("my_spark_shell_wc_output")
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Answer is not availble for this assesment
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...