Previous Index Next

Creating a lookup table

Computer can only process numbers but not words. Thus we need to convert the words in truncated_vocabulary into numbers.

So we now need to add a preprocessing step to replace each word with its ID (i.e., its index in the truncated_vocabulary). We will create a lookup table for this, using 1,000 out-of-vocabulary (oov) buckets.

We shall create the lookup table such that the most frequently occurring words have lower indices than less frequently occurring words.

Note:

tf.lookup.KeyValueTensorInitializer : Table initializer given keys and values tensors. More here
tf.lookup.StaticVocabularyTable : String to Id table wrapper that assigns out-of-vocabulary keys to buckets. More here

If <other term> -> bucket_id, where bucket_id will be between 3 and 3 + num_oov_buckets - 1, calculated by: hash(<term>) % num_oov_buckets + vocab_size
table.lookup : Looks up keys in the table, outputs the corresponding values.

INSTRUCTIONS

Create a tensor words containing the words of truncated_vocabulary.
```
<< your code comes here >>= tf.constant(truncated_vocabulary)
```
Create the word_ids using the corresponding indices of words in truncated_vocabulry.
```
word_ids = tf.range(len(truncated_vocabulary), dtype=tf.int64)
```
Create the table initializer vocab_init using tf.lookup.KeyValueTensorInitializer, given the keys(here words) and the values(here word_ids) tensors.
```
vocab_init = << your code comes here >>(words, word_ids)
```
Set num_oov_buckets = 1000 and create the lookup table table using tf.lookup.StaticVocabularyTable. Observe, we pass the vocab_init, num_oov_buckets as input arguments to this.
```
num_oov_buckets = 1000
table = << your code comes here >>(vocab_init, num_oov_buckets)
```
Let's use the above table to look up the IDs of a few words:
```
table.lookup(tf.constant([b"This movie was faaaaaantastic".split()]))
```
Note: The words “this,” “movie,” and “was” were found in the table, so their IDs are lower than 10,000, while the word “faaaaaantastic” was not found, so it was mapped to one of the oov buckets, with an ID greater than or equal to 10,000.

Get Hint See Answer

Note - Having trouble with the assessment engine? Follow the steps listed here

Project - How to Build a Sentiment Classifier using Python and IMDB Reviews

Creating a lookup table

XP

Loading comments...