Linux Basics

92 / 107

Word Count Exercise

Step 1:

Check the Data using the cat command. Since the file is big, you can use more to see pagewise

    cat /cxldata/big.txt | more

Step 2:

Replace space with newline such that every line in output contains only single word:

    cat /cxldata/big.txt | sed 's/ /\n/g' |more

For example, after replacing space with a new line in "I am ok" we should get:

I
am
ok

So, as we know, syntax of sed is sed 's/word/new_word', so here we are just replacing the space character () with the new line character(\n). The /g is an option of sed which makes replace all occurrences of space instead of only one.

Also, note this command has three programs connected by two pipes. The output of the cat is going to sed and the output of sed is going to more to see the result pagewise.

Step 3:

We can sort the words using sort command in the following way

    cat /cxldata/big.txt | sed 's/ /\n/g' | sort|more

Note that we are using the more command just to avoid screen-blindness (too much text scrolling).

Step 4:

We can now, count the words using uniq command

    cat /cxldata/big.txt | sed 's/ /\n/g' | sort|uniq -c|more

Please save the result of the command to a file word_count_results in your home directory

    cat /cxldata/big.txt | sed 's/ /\n/g' | sort|uniq -c > word_count_results


Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...