Enrollments Open for Advanced Certification Courses on Data Science, ML & AI by E&ICT Academy IIT RoorkeeApply Now
We can further improve the word frequency count by using more filters.
Translate to lower case using
tr 'A-Z' 'a-z'
Remove non-alphanumeric characters using
sed with regular expression:
Replace all whitespace (multiple tabs and spaces):
sed -E 's/[ \t]+/\n/g'
Please note that since we are using regular expressions, we need to specify
Display most frequent at the top or display the results in reverse numeric sorting:
If the input file is big, the sort command might use too much memory. So, you can force
sort command to use less memory say 100 MB:
sort -S 100M
After all of these improvements, please save the results
cat /cxldata/big.txt |tr 'A-Z' 'a-z'| sed -E 's/[ \t]+/\n/g'|sed 's/[^0-9a-z]//g' | sort|uniq -c|sort -nr -S 50M > word_count_results_nice
No hints are availble for this assesment
Answer is not availble for this assesment