Linux Basics for Big Data

75 / 87
Shell script for WordCount

A shell script is a file which contains the commands separated by a newline.

Let's create a script to do the sorting of the data:

  1. Create a file using nano text editor:

  2. The first line of a script should have "#!" followed by the name of the program to execute the script with. Since we are creating a shell script, we want it to be executed using bash. So, the first line of the program should be:

  3. Add the command in the editor:

    tr 'A-Z' 'a-z'| sed -E 's/[ \t]+/\n/g'|sed 's/[^0-9a-z]//g' | sort|uniq -c|sort -nr -S 50M
  4. Save the file by pressing Ctrl+x and "y"

  5. Now, make this file executable:

    chmod +x
  6. Check if it is running:

    cat /cxldata/big.txt | ./ | more

    Please recall "./" means current directory and " | more" will show the result pagewise instead of causing too much scrolling.

    Also, note that whatever is the input to the script is also passed to the programs executed in the script.