#NoPayJan Offer - Access all CloudxLab Courses for free between 1st to 31st JanEnroll Now >>
Now we will create this preprocessing function where we will:
Truncate the reviews, keeping only the first 300 characters of each since you can generally tell whether a review is positive or not in the first sentence or two.
Then we use regular expressions to replace
<br/> tags with spaces and to replace any characters other than letters and quotes with spaces.
preprocess() function splits the reviews by the spaces, which returns a ragged tensor, and it converts this ragged tensor to a dense tensor, padding all reviews with the padding token
<pad> so that they all have the same length.
tf.strings - Operations for working with string Tensors.
tf.strings.substr(X_batch, 0, 300) - For each string in the input Tensor
X_batch, it creates a substring starting at index
pos(here 0) with a total length of
len(here 300). So basically, it returns substrings from Tensor of strings.
tf.strings.regex_replace(X_batch, rb"<br\s*/?>", b" ") - Replaces elements of
X_batch matching regex pattern
<br\s*/?> with rewrite
tf.strings.split(X_batch) - Split elements of input
X_batch into a RaggedTensor.
X_batch.to_tensor(default_value=b"<pad>") - Converts the RaggedTensor into a
default_value is the value to set for indices not specified in
X_batch. Empty values are assigned
Use the following code to preprocess the data as described above:
def preprocess(X_batch, y_batch): X_batch = tf.strings.substr(X_batch, 0, 300) X_batch = tf.strings.regex_replace(X_batch, rb"<br\s*/?>", b" ") X_batch = tf.strings.regex_replace(X_batch, b"[^a-zA-Z']", b" ") X_batch = tf.strings.split(X_batch) return X_batch.to_tensor(default_value=b"<pad>"), y_batch
Let us now call the
preprocess() function on
X_batch, y_batch to see how the data after preprocessing looks like:
<< your code comes here >>(X_batch, y_batch)
No hints are availble for this assesment
Answer is not availble for this assesment
Note - Having trouble with the assessment engine? Follow the steps listed here