Given below is the Scala code for counting word frequencies
var linesRdd = sc.textFile("/data/mr/wordcount/input/big.txt")
var words = linesRdd.flatMap(x => x.split(" "))
var wordsKv = words.map(x => (x, 1))
//def myfunc(x:Int, y:Int): Int = x + y
var output = wordsKv.reduceByKey(_ + _)
output.take(10)
We can also save the output to HDFS:
output.saveAsTextFile("my_result")
Note - In this video, we used Hue to access the results in HDFS. We have deprecated the Hue. Please use the below commands in the web console to access the files
Check the files
hadoop fs -ls my_result
Check the content of the first part
hadoop fs -cat my_result/part-00000 | more
Check the content of the second part
hadoop fs -cat my_result/part-00001 | more
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Please login to comment
10 Comments
Hi, I am unable to see Hue near Ambari, Jupyter and Web console. Could you please me with that?
Upvote ShareHi Tanvi,
We have disabled Hue. You can refer to https://discuss.cloudxlab.com/t/should-we-be-using-hue/5821/2?u=shubh_tripathi for more details.
Upvote ShareIts not creating part-00000 files for me result is different
[aimlankit2262@cxln4 ~]$ hadoop fs -ls my_result
Found 1 items
drwxr-xr-x - aimlankit2262 aimlankit2262 0 2022-10-03 03:39 my_result/_temporary
[aimlankit2262@cxln4 ~]$
Hi Ankit,
Can you please share the screenshot of the code you used for counting word frequencies?
Upvote ShareHere inside mydirectory its showing temporaary file it should show some part directory.
Upvote ShareBut, where have you created the partition?
Upvote ShareI am getting this error. Can anybody please help me on this?
<console>:1: error: Decimal integer literals may not have a leading zero. (Octal syntax is obsolete.)
Upvote Sharehadoop fs -cat my_result/part-00000 | more
Hi, It's working fine from my end. Can you please check it again? If you are still facing the problem, share the code of counting word frequencies here.
Upvote SharePlease add the Scala course before Apache Spark course? This course is not organized correctly. I am not able to follow the course.
.
Upvote ShareThis comment has been removed.