Creating the Final Train and Test sets

Now we will create the final training and test sets.

For creating the final training set train_set,

we batch the reviews
then we convert them to short sequences of words using the preprocess() function
then encode these words using a simple encode_words() function that uses the table we just built and finally prefetch the next batch.

Let us test the model(after training) on 1000 samples of the test data as it takes a lot of time to test on the whole test set. So we shall create the final test set on 1000 samples as follows.

For creating the final test set test_set,

we create a batch of 1000 test samples
then we convert them to short sequences of words using the preprocess() function
then encode these words using a simple encode_words() function that uses the table we just built.

Note:

dataset.repeat().batch(32) repeatedly creates the batches of 32 samples in the dataset.
dataset.repeat().batch(32).map(preprocess) applies the function preprocess on every batch.
dataset.map(encode_words).prefetch(1) applies the function encode_words to the data samples and paralelly fetches the next batch.

INSTRUCTIONS

Define the encode_words() function to encode the words of train data using the lookup table table.
```
def encode_words(X_batch, y_batch):
    return table.lookup(X_batch), y_batch
```
Apply the function preprocess on every batch of data with 32 samples repeatedly on the train data datasets["train"].
```
train_set = datasets["train"].repeat().batch(32).map(<< your code comes here >>)
```
Apply the function encode_words to the train_set and parallelly fetch the next batch.
```
train_set = train_set.map(<< your code comes here >>).prefetch(1)
```

Similarly, apply the function preprocess on the test data datasets["test"].

test_set = datasets["test"].batch(1000).map(<< your code comes here >>)

Apply the function encode_words to the test_set.

test_set = test_set.map(<< your code comes here >>)

Let us see how the first data sample of the thus obtained train_set looks like:

for X_batch, y_batch in train_set.take(1):
    print(X_batch)
    print(y_batch)

Get Hint See Answer

Project - How to Build a Sentiment Classifier using Python and IMDB Reviews

Creating the Final Train and Test sets

XP

Loading comments...