Enrollments Open for Advanced Certification Courses on Data Science, ML & AI by E&ICT Academy IIT Roorkee
Apply NowIn this exercise, we are going to learning how to perform wordcount using spark.
Step 1: Start the spark shell using following command and wait for prompt to appear
spark-shell
Step 2: Create RDD from a file in HDFS, type the following on spark-shell and press enter:
var linesRDD = sc.textFile("/data/mr/wordcount/input/big.txt")
Step 3: Convert each record into word
var wordsRDD = linesRDD.flatMap(_.split(" "))
Step 3: Convert each word into key-value pair
var wordsKvRdd = wordsRDD.map((_, 1))
Step 3: Group By key and perform aggregation on each key:
var wordCounts = wordsKvRdd.reduceByKey(_ + _ )
Step 3: Save the results into HDFS:
wordCounts.saveAsTextFile("my_spark_shell_wc_output")
No hints are availble for this assesment
Answer is not availble for this assesment
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...