Apache Spark Basics

37 / 89

Apache Spark - Counting Word Frequencies




Not able to play video? Try with vimeo

INSTRUCTIONS
  • Given below is the Scala code for counting word frequencies

    var linesRdd = sc.textFile("/data/mr/wordcount/input/big.txt")
    var words = linesRdd.flatMap(x => x.split(" "))
    var wordsKv = words.map(x => (x, 1))
    //def myfunc(x:Int, y:Int): Int = x + y
    var output = wordsKv.reduceByKey(_ + _)
    output.take(10)
    

    We can also save the output to HDFS:

    output.saveAsTextFile("my_result")
    

Note - In this video, we used Hue to access the results in HDFS. We have deprecated the Hue. Please use the below commands in the web console to access the files

  • Login to the web console
  • Check the files

    hadoop fs -ls  my_result
    
  • Check the content of the first part

    hadoop fs -cat my_result/part-00000 | more
    
  • Check the content of the second part

    hadoop fs -cat my_result/part-00001 | more
    

Loading comments...