Python for Machine Learning - Live Instructor-led Training Enroll For Free
Write a MapReduce code to find users having same DNA in the file stored in HDFS.
The file is located at
Output file will have the users having same DNA
ACG ['User5', 'User3'] ACGT ['User4', 'User1']
Check out mapper.py and reducer.py in GitHub
If you haven't cloned the CloudxLab GitHub repository, then clone it in your home folder in web console using the below command
git clone https://github.com/singhabhinav/cloudxlab.git ~/cloudxlab
Else, update the local copy
cd ~/cloudxlab git pull origin master
Go to same_dna directory
Run the MapReduce code using Hadoop streaming. Please make sure to save output in mapreduce-programming/same_dna directory inside your home directory in HDFS. Run the below command
hadoop jar /usr/hdp/184.108.40.206-3485/hadoop-mapreduce/hadoop-streaming.jar -input /data/mr/dna/dna.txt -output mapreduce-programming/same_dna -mapper mapper.py -file mapper.py -reducer reducer.py -file reducer.py
Check the frequency of characters by typing below command.
hadoop fs -cat mapreduce-programming/same_dna/*
Taking you to the next exercise in seconds...