Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left
Apply NowLogin using Social Account
     Continue with GoogleLogin using your credentials
Now we will create this preprocessing function where we will:
Truncate the reviews, keeping only the first 300 characters of each since you can generally tell whether a review is positive or not in the first sentence or two.
Then we use regular expressions to replace <br/>
tags with spaces and to replace any characters other than letters and quotes with spaces.
Finally, the preprocess()
function splits the reviews by the spaces, which returns a ragged tensor, and it converts this ragged tensor to a dense tensor, padding all reviews with the padding token <pad>
so that they all have the same length.
Note:
tf.strings
- Operations for working with string Tensors.
tf.strings.substr(X_batch, 0, 300)
- For each string in the input Tensor X_batch
, it creates a substring starting at index pos
(here 0) with a total length of len
(here 300). So basically, it returns substrings from Tensor of strings.
tf.strings.regex_replace(X_batch, rb"<br\s*/?>", b" ")
- Replaces elements of X_batch
matching regex pattern <br\s*/?>
with rewrite .
tf.strings.split(X_batch)
- Split elements of input X_batch
into a RaggedTensor.
X_batch.to_tensor(default_value=b"<pad>")
- Converts the RaggedTensor into a tf.Tensor
. default_value
is the value to set for indices not specified in X_batch
. Empty values are assigned default_value
(here <pad>
).
Use the following code to preprocess the data as described above:
def preprocess(X_batch, y_batch):
X_batch = tf.strings.substr(X_batch, 0, 300)
X_batch = tf.strings.regex_replace(X_batch, rb"<br\s*/?>", b" ")
X_batch = tf.strings.regex_replace(X_batch, b"[^a-zA-Z']", b" ")
X_batch = tf.strings.split(X_batch)
return X_batch.to_tensor(default_value=b"<pad>"), y_batch
Let us now call the preprocess()
function on X_batch, y_batch
to see how the data after preprocessing looks like:
<< your code comes here >>(X_batch, y_batch)
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...