Double Delight Sale: Flat 70% + Addl. 30% Off + 30-days Extra Lab on all Courses | Use Coupon DS30 in Checkout | Offer Expires InEnroll Now
In this exercise, we are going to learning how to perform wordcount using spark.
Step 1: Start the spark shell using following command and wait for prompt to appear
Step 2: Create RDD from a file in HDFS, type the following on spark-shell and press enter:
var linesRDD = sc.textFile("/data/mr/wordcount/input/big.txt")
Step 3: Convert each record into word
var wordsRDD = linesRDD.flatMap(_.split(" "))
Step 3: Convert each word into key-value pair
var wordsKvRdd = wordsRDD.map((_, 1))
Step 3: Group By key and perform aggregation on each key:
var wordCounts = wordsKvRdd.reduceByKey(_ + _ )
Step 3: Save the results into HDFS:
Taking you to the next exercise in seconds...
No hints are availble for this assesment
Answer is not availble for this assesment
Note - Having trouble with the assessment engine? Follow the steps listed here