Enrollments Open for Advanced Certification Courses on Data Science, ML & AI by E&ICT Academy IIT Roorkee
Apply NowWe can further improve the word frequency count by using more filters.
Improvement 1:
Translate to lower case using
tr 'A-Z' 'a-z'
Improvement 2:
Remove non-alphanumeric characters using sed
with regular expression:
sed 's/[^0-9a-z]//g'
Improvement 3:
Replace all whitespace (multiple tabs and spaces):
sed -E 's/[ \t]+/\n/g'
Please note that since we are using regular expressions, we need to specify -E
Improvement 4:
Display most frequent at the top or display the results in reverse numeric sorting:
sort -nr
Improvement 5:
If the input file is big, the sort command might use too much memory. So, you can force sort
command to use less memory say 100 MB:
sort -S 100M
After all of these improvements, please save the results
cat /cxldata/big.txt |tr 'A-Z' 'a-z'| sed -E 's/[ \t]+/\n/g'|sed 's/[^0-9a-z]//g' | sort|uniq -c|sort -nr -S 50M > word_count_results_nice
No hints are availble for this assesment
Answer is not availble for this assesment
Loading comments...