MapReduce Programming

6 / 13

MapReduce Programming - Using Python count the frequency of characters in a file stored in HDFS


Write a MapReduce code to count the frequency of characters in a file stored in HDFS.


The file is located at


Sample Output

Output file will contain the characters and their frequency in the file

a     48839
b     84930
c     84939


  1. Check out the and in GitHub

  2. If you haven't cloned the CloudxLab GitHub repository, then clone it in your home folder in web console using the below command

    git clone ~/cloudxlab
  3. Else, update the local copy

    cd ~/cloudxlab
    git pull origin master
  4. Go to count_character_frequency directory

    cd ~/cloudxlab/hdpexamples/python-streaming/character_frequency
  5. Run the MapReduce code using Hadoop streaming. Please make sure to save output in mapreduce-programming/character_frequency directory inside your home directory in HDFS. Run the below command

    hadoop jar /usr/hdp/ -input /data/mr/wordcount/big.txt -output mapreduce-programming/character_frequency -mapper -file -reducer -file
    hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar -input /data/mr/wordcount/big.txt -output mapreduce-programming/character_frequency -mapper -file -reducer -file
  6. Check the frequency of characters by typing below command.

    hadoop fs -cat mapreduce-programming/character_frequency/* | tail
See Answer

No hints are availble for this assesment

Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...