Project - How to Build a Sentiment Classifier using Python and IMDB Reviews

6 / 11

Constructing the Vocabulary

Next, we will construct the vocabulary. This requires going through the whole training set once, applying our preprocess() function, and using a Counter() to count the number of occurrences of each word.

Note:

Counter().update() : We can add values to the Counter by using update() method.
map(myfunc) of the tensorflow datasets maps the function(or applies the function) myfunc across all the samples of the given dataset. More here.

INSTRUCTIONS

Make sure to write each block of code below in different code-cells.

Import Counter from collections.

from << your code comes here >> import << your code comes here >>

Get the Counter() object vocabulary.
```
<< your code comes here >> = Counter()
```

For each review in every batch of the train data, let us make a vocabulary dictionary containing the words and their counts correspondingly:

for X_batch, y_batch in datasets["train"].batch(2).map(preprocess):
    for review in X_batch:
        vocabulary.update(list(review.numpy()))

Let’s look at the 5 most common words:
```
vocabulary.most_common()[:5]
```
Let us find the length of the vocabulary using len function.
```
<< your code comes here >>(vocabulary)
```

Get Hint See Answer

Note - Having trouble with the assessment engine? Follow the steps listed here

Previous Index Next

Please login to comment

6 Comments

Smriti Yadav

2 years ago

Dear Team,

I am getting the undefined vocabulary error here. The code is giving me a perfect output but the engine is throwing this error.Pls advise.

Thanks

Upvote Share

Shubh Tripathi

2 years ago

Hi Smriti,

Make sure to execute the cell where you have defined the variable 'vocabulary' before submitting the answer.

Upvote Share

Dr. Manpreet Singh Sehgal

3 years ago

undefined vocabulary or it is not valid

Upvote Share

Dr. Manpreet Singh Sehgal

3 years ago

This testing engine has a problem. i wrote the code as instructed, it sowed me error. I used a hint and lost 10 points. Yet it showed the same steps. I further lost 50 points and saw the same code but did not get rid of the same error, Please correct the bug in the testing engine.

Upvote Share

Divya Shree .S

4 years ago

I copy pasted the same line of code. The following error is being displayed

from collections import Counter

vocabulary = Counter()
for X_batch, y_batch in datasets["train"].batch(2).map(preprocess):
for review in X_batch:
vocabulary.update(list(review.numpy()))

vocabulary.most_common()[:5]

len(vocabulary)

Upvote Share

Dr. Manpreet Singh Sehgal

3 years ago

The error in your case is because of the code not written in different cells.

Upvote Share

Project - How to Build a Sentiment Classifier using Python and IMDB Reviews

Constructing the Vocabulary

XP

Please login to comment

6 Comments