Project - Building Spam Classifier

16 / 27

Spam Classifier - Test HTML to Plain Text Function

Now we will test the HTML to Plain Text function we created in the previous step.

  • First, let's select a spam email, store it in a variable, and print it:

    html_spam_emails = [email for email in X_train[y_train==1]
                        if get_email_structure(email) == "text/html"]
    sample_html_spam = html_spam_emails[7]
    print(sample_html_spam.get_content().strip()[:1000], "...")

    This is what the spam email looks like in original form.

  • Now we will convert it into plain text using the html_to_plain_text function we created and passing to that function this variable we created above:

    print(<< your code goes here >>(sample_html_spam.get_content())[:1000], "...")
  • Finally, let's write a function that takes an email as input and returns its content as plain text, whatever its format is:

    def email_to_text(email):
        html = None
        for part in email.walk():
            ctype = part.get_content_type()
            if not ctype in ("text/plain", "text/html"):
                content = part.get_content()
            except: # in case of encoding issues
                content = str(part.get_payload())
            if ctype == "text/plain":
                return content
                html = content
        if html:
            return html_to_plain_text(html)
Get Hint See Answer

Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...