Project - How to Build a Sentiment Classifier using Python and IMDB Reviews

Truncating the Vocabulary

There are more than 50,000 words in the vocabulary. So let us truncate it to have only 10,000 most common words.

  • Set vocab_size to 10000.

    vocab_size = << your code comes here >>
  • Extract the top 10,000 most frequently occurring words from vocabulary and store these words in truncated_vocabulary list(let us use the list comprehension method to do so).

    << your code comes here >> = [ word for word, count in vocabulary.most_common()[:vocab_size]]
