Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left
Apply NowLogin using Social Account
     Continue with GoogleLogin using your credentials
We can further improve the word frequency count by using more filters.
Improvement 1:
Translate to lower case using
tr 'A-Z' 'a-z'
Improvement 2:
Remove non-alphanumeric characters using sed
with regular expression:
sed 's/[^0-9a-z]//g'
Improvement 3:
Replace all whitespace (multiple tabs and spaces):
sed -E 's/[ \t]+/\n/g'
Please note that since we are using regular expressions, we need to specify -E
Improvement 4:
Display most frequent at the top or display the results in reverse numeric sorting:
sort -nr
Improvement 5:
If the input file is big, the sort command might use too much memory. So, you can force sort
command to use less memory say 100 MB:
sort -S 100M
After all of these improvements, please save the results
cat /cxldata/big.txt |tr 'A-Z' 'a-z'| sed -E 's/[ \t]+/\n/g'|sed 's/[^0-9a-z]//g' | sort|uniq -c|sort -nr -S 50M > word_count_results_nice
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Answer is not availble for this assesment
Loading comments...