Spam Classifier - Test HTML to Plain Text Function

Now we will test the HTML to Plain Text function we created in the previous step.

INSTRUCTIONS

First, let's select a spam email, store it in a variable, and print it:

html_spam_emails = [email for email in X_train[y_train==1]
                    if get_email_structure(email) == "text/html"]
sample_html_spam = html_spam_emails[7]
print(sample_html_spam.get_content().strip()[:1000], "...")

This is what the spam email looks like in original form.

Now we will convert it into plain text using the html_to_plain_text function we created and passing to that function this variable we created above:
```
print(<< your code goes here >>(sample_html_spam.get_content())[:1000], "...")
```

Finally, let's write a function that takes an email as input and returns its content as plain text, whatever its format is:

def email_to_text(email):
    html = None
    for part in email.walk():
        ctype = part.get_content_type()
        if not ctype in ("text/plain", "text/html"):
            continue
        try:
            content = part.get_content()
        except: # in case of encoding issues
            content = str(part.get_payload())
        if ctype == "text/plain":
            return content
        else:
            html = content
    if html:
        return html_to_plain_text(html)

Get Hint See Answer

Previous Index Next

Project - Building Spam Classifier

Spam Classifier - Test HTML to Plain Text Function

XP

Loading comments...