Login using Social Account
     Continue with GoogleLogin using your credentials
Now we will test the HTML to Plain Text function we created in the previous step.
First, let's select a spam email, store it in a variable, and print it:
html_spam_emails = [email for email in X_train[y_train==1]
if get_email_structure(email) == "text/html"]
sample_html_spam = html_spam_emails[7]
print(sample_html_spam.get_content().strip()[:1000], "...")
This is what the spam email looks like in original form.
Now we will convert it into plain text using the html_to_plain_text
function we created and passing to that function this variable we created above:
print(<< your code goes here >>(sample_html_spam.get_content())[:1000], "...")
Finally, let's write a function that takes an email as input and returns its content as plain text, whatever its format is:
def email_to_text(email):
html = None
for part in email.walk():
ctype = part.get_content_type()
if not ctype in ("text/plain", "text/html"):
continue
try:
content = part.get_content()
except: # in case of encoding issues
content = str(part.get_payload())
if ctype == "text/plain":
return content
else:
html = content
if html:
return html_to_plain_text(html)
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Loading comments...