Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left
Apply NowLogin using Social Account
     Continue with GoogleLogin using your credentials
Problem
Write a MapReduce code to find anagrams in a text file stored in HDFS. An anagram is basically a different arrangement of letters in a word. Anagram does not need to be meaningful
Dataset
The file is located at
/data/mr/wordcount/big.txt
Sample Output
Output file will contain the anagrams in the text file
3 ['bowel,', 'elbow,', 'below,']
3 ['bore', 'boer', 'robe']
3 ['bears', 'baser', 'saber']
Steps
Check out the mapper.py and reducer.py in GitHub
If you haven't cloned the CloudxLab GitHub repository, then clone it in your home folder in web console using the below command
git clone https://github.com/singhabhinav/cloudxlab.git ~/cloudxlab
Else, update the local copy
cd ~/cloudxlab
git pull origin master
Go to find_anagrams directory
cd ~/cloudxlab/hdpexamples/python-streaming/find_anagrams
Run the MapReduce code using Hadoop streaming. Please make sure to save output in find_anagrams in mapreduce-programming/find_anagrams directory inside your home directory in HDFS. Run the below command
hadoop jar /usr/hdp/2.6.2.0-205/hadoop-mapreduce/hadoop-streaming.jar -input /data/mr/wordcount/big.txt -output mapreduce-programming/find_anagrams -mapper mapper.py -file mapper.py -reducer reducer.py -file reducer.py
Check the frequency of characters by typing below command.
hadoop fs -cat mapreduce-programming/find_anagrams/* | sort -nr | head -n 20
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Answer is not availble for this assesment
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...