MapReduce Programming

12 / 13

MapReduce Programming - Find users having same or Mirror DNA

Problem

Write a MapReduce code to find users having same or mirror images of DNA in the file stored in HDFS.

Dataset

The file is located at

/data/mr/dna/dna.txt

Sample Output

Output file will contain the users having same or mirror DNA

['User5', 'User3']
['User4', 'User2', 'User1']

Steps

  1. Checkout mapper.py and reducer.py in GitHub

  2. If you haven't cloned the CloudxLab GitHub repository, then clone it in your home folder in web console using the below command

    git clone https://github.com/singhabhinav/cloudxlab.git ~/cloudxlab
    
  3. Else, update the local copy

    cd ~/cloudxlab
    git pull origin master
    
  4. Go to mirror_dna

    cd ~/cloudxlab/hdpexamples/python-streaming/mirror_dna/
    
  5. Run the MapReduce code using Hadoop streaming. Please make sure to save output in mapreduce-programming/mirror_dna directory inside your home directory in HDFS. Run the below command

    hadoop jar /usr/hdp/2.6.2.0-205/hadoop-mapreduce/hadoop-streaming.jar -input /data/mr/dna/dna.txt -output mapreduce-programming/mirror_dna -mapper mapper.py -file mapper.py -reducer reducer.py -file reducer.py
    
  6. Check the frequency of characters by typing below command.

    hadoop fs -cat mapreduce-programming/mirror_dna/*
    

No hints are availble for this assesment

Answer is not availble for this assesment


Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...