Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left
Apply NowLogin using Social Account
     Continue with GoogleLogin using your credentials
We will define functions to tackle common website text quirks, like fixing hyphenated words or inconsistent newline characters. By meticulously running these functions, the script will polish the raw text and guarantee a consistent, high-quality format for the chatbot's knowledge base.
Importing Libraries:
import re
Define a function merge_hyphenated_words
that takes "text" as an input and merges the hyphenated words.
def merge_hyphenated_words(text):
return re.sub(r"(\w)-\n(\w)", r"\1\2", text)
Define a function fix_newlines
that takes "text" as an input and fixes inconsistent newline characters.
def fix_newlines(text):
return re.sub(r"(?<!\n)\n(?!\n)", " ", text)
Define a function remove_multiple_newlines
that takes "text" as an input and remove_multiple_newlines.
def remove_multiple_newlines(text):
return re.sub(r"\n{2,}", "\n", text)
Now, as we have our cleaning functions in place, let's write a function clean_text
that takes text as an input and passes it through all the above cleaning functions.
def clean_text(text):
cleaning_functions = [merge_hyphenated_words, fix_newlines, remove_multiple_newlines]
for cleaning_function in cleaning_functions:
text = cleaning_function(text)
return text
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...