Login using Social Account
     Continue with GoogleLogin using your credentials
Computer can only process numbers but not words. Thus we need to convert the words in truncated_vocabulary
into numbers.
So we now need to add a preprocessing step to replace each word with its ID (i.e., its index in the truncated_vocabulary
). We will create a lookup table for this, using 1,000 out-of-vocabulary (oov) buckets.
We shall create the lookup table such that the most frequently occurring words have lower indices than less frequently occurring words.
Note:
tf.lookup.KeyValueTensorInitializer
: Table initializer given keys and values tensors. More here
tf.lookup.StaticVocabularyTable
: String to Id table wrapper that assigns out-of-vocabulary keys to buckets. More here
If <other term> -> bucket_id
, where bucket_id will be between 3 and 3 + num_oov_buckets
- 1, calculated by: hash(<term>
) % num_oov_buckets
+ vocab_size
table.lookup
: Looks up keys in the table, outputs the corresponding values.
Create a tensor words
containing the words of truncated_vocabulary
.
<< your code comes here >>= tf.constant(truncated_vocabulary)
Create the word_ids
using the corresponding indices of words in truncated_vocabulry
.
word_ids = tf.range(len(truncated_vocabulary), dtype=tf.int64)
Create the table initializer vocab_init
using tf.lookup.KeyValueTensorInitializer
, given the keys(here words
) and the values(here word_ids
) tensors.
vocab_init = << your code comes here >>(words, word_ids)
Set num_oov_buckets = 1000
and create the lookup table table
using tf.lookup.StaticVocabularyTable
. Observe, we pass the vocab_init, num_oov_buckets
as input arguments to this.
num_oov_buckets = 1000
table = << your code comes here >>(vocab_init, num_oov_buckets)
Let's use the above table to look up the IDs of a few words:
table.lookup(tf.constant([b"This movie was faaaaaantastic".split()]))
Note: The words “this,” “movie,” and “was” were found in the table, so their IDs are lower than 10,000, while the word “faaaaaantastic” was not found, so it was mapped to one of the oov buckets, with an ID greater than or equal to 10,000.
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Loading comments...