Enrollments Open for Certification Course on Artificial Intelligence and Deep Learning by IIT RoorkeeApply Now
We can further improve the word frequency count by using more filters.
Translate to lower case using
tr 'A-Z' 'a-z'
Remove non-alphanumeric characters using
sed with regular expression:
Replace all whitespace (multiple tabs and spaces):
sed -E 's/[ \t]+/\n/g'
Please note that since we are using regular expressions, we need to specify
Display most frequent at the top or display the results in reverse numeric sorting:
If the input file is big, the sort command might use too much memory. So, you can force
sort command to use less memory say 100 MB:
sort -S 100M
After all of these improvements, please save the results
cat /cxldata/big.txt |tr 'A-Z' 'a-z'| sed -E 's/[ \t]+/\n/g'|sed 's/[^0-9a-z]//g' | sort|uniq -c|sort -nr -S 50M > word_count_results_nice
No hints are availble for this assesment
Answer is not availble for this assesment