Apache Spark - Counting Word Frequencies

Not able to play video? Try with youtube

INSTRUCTIONS

Given below is the Scala code for counting word frequencies

var linesRdd = sc.textFile("/data/mr/wordcount/input/big.txt")
var words = linesRdd.flatMap(x => x.split(" "))
var wordsKv = words.map(x => (x, 1))
//def myfunc(x:Int, y:Int): Int = x + y
var output = wordsKv.reduceByKey(_ + _)
output.take(10)

We can also save the output to HDFS:

output.saveAsTextFile("my_result")

Note - In this video, we used Hue to access the results in HDFS. We have deprecated the Hue. Please use the below commands in the web console to access the files

Apache Spark Basics

Apache Spark - Counting Word Frequencies

XP

Loading comments...