MapReduce Programming

8 / 13

MapReduce Programming - Find anagrams in a text file


Write a MapReduce code to find anagrams in a text file stored in HDFS. An anagram is basically a different arrangement of letters in a word. Anagram does not need to be meaningful


The file is located at


Sample Output

Output file will contain the anagrams in the text file

3   ['bowel,', 'elbow,', 'below,']
3   ['bore', 'boer', 'robe']
3   ['bears', 'baser', 'saber']


  1. Check out the and in GitHub

  2. If you haven't cloned the CloudxLab GitHub repository, then clone it in your home folder in web console using the below command

    git clone ~/cloudxlab
  3. Else, update the local copy

    cd ~/cloudxlab
    git pull origin master
  4. Go to find_anagrams directory

    cd ~/cloudxlab/hdpexamples/python-streaming/find_anagrams
  5. Run the MapReduce code using Hadoop streaming. Please make sure to save output in find_anagrams in mapreduce-programming/find_anagrams directory inside your home directory in HDFS. Run the below command

    hadoop jar /usr/hdp/ -input /data/mr/wordcount/big.txt -output mapreduce-programming/find_anagrams -mapper -file -reducer -file
  6. Check the frequency of characters by typing below command.

    hadoop fs -cat mapreduce-programming/find_anagrams/* | sort -nr | head -n 20