MapReduce Programming - Find users having same or Mirror DNA

Problem

Write a MapReduce code to find users having same or mirror images of DNA in the file stored in HDFS.

Dataset

The file is located at

/data/mr/dna/dna.txt

Sample Output

Output file will contain the users having same or mirror DNA

['User5', 'User3']
['User4', 'User2', 'User1']

Steps

Checkout mapper.py and reducer.py in GitHub
If you haven't cloned the CloudxLab GitHub repository, then clone it in your home folder in web console using the below command
```
git clone https://github.com/singhabhinav/cloudxlab.git ~/cloudxlab
```
Else, update the local copy
```
cd ~/cloudxlab
git pull origin master
```

Go to mirror_dna

cd ~/cloudxlab/hdpexamples/python-streaming/mirror_dna/

Run the MapReduce code using Hadoop streaming. Please make sure to save output in mapreduce-programming/mirror_dna directory inside your home directory in HDFS. Run the below command

hadoop jar /usr/hdp/2.6.2.0-205/hadoop-mapreduce/hadoop-streaming.jar -input /data/mr/dna/dna.txt -output mapreduce-programming/mirror_dna -mapper mapper.py -file mapper.py -reducer reducer.py -file reducer.py

Check the frequency of characters by typing below command.
```
hadoop fs -cat mapreduce-programming/mirror_dna/*
```

Note - Having trouble with the assessment engine? Follow the steps listed here

MapReduce Programming

MapReduce Programming - Find users having same or Mirror DNA

XP

Loading comments...