Login using Social Account
     Continue with GoogleLogin using your credentials
Now we will create this preprocessing function where we will:
Truncate the reviews, keeping only the first 300 characters of each since you can generally tell whether a review is positive or not in the first sentence or two.
Then we use regular expressions to replace <br/> tags with spaces and to replace any characters other than letters and quotes with spaces.
Finally, the preprocess() function splits the reviews by the spaces, which returns a ragged tensor, and it converts this ragged tensor to a dense tensor, padding all reviews with the padding token <pad> so that they all have the same length.
Note:
tf.strings - Operations for working with string Tensors.
tf.strings.substr(X_batch, 0, 300) - For each string in the input Tensor X_batch, it creates a substring starting at index pos(here 0) with a total length of len(here 300). So basically, it returns substrings from Tensor of strings.
tf.strings.regex_replace(X_batch, rb"<br\s*/?>", b" ") - Replaces elements of X_batch matching regex pattern <br\s*/?> with rewrite .
tf.strings.split(X_batch) - Split elements of input X_batch into a RaggedTensor.
X_batch.to_tensor(default_value=b"<pad>") - Converts the RaggedTensor into a tf.Tensor. default_value is the value to set for indices not specified in X_batch. Empty values are assigned default_value(here <pad>).
Use the following code to preprocess the data as described above:
def preprocess(X_batch, y_batch):
X_batch = tf.strings.substr(X_batch, 0, 300)
X_batch = tf.strings.regex_replace(X_batch, rb"<br\s*/?>", b" ")
X_batch = tf.strings.regex_replace(X_batch, b"[^a-zA-Z']", b" ")
X_batch = tf.strings.split(X_batch)
return X_batch.to_tensor(default_value=b"<pad>"), y_batch
Let us now call the preprocess() function on X_batch, y_batch to see how the data after preprocessing looks like:
<< your code comes here >>(X_batch, y_batch)
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...