Project - Building Spam Classifier

10 / 27

Spam Classifier - Looking at Type of Email Structures

In this step we will look at the various type of email structures. Some emails are actually multipart, with images and attachments (which can have their own attachments). Let's look at the various types of structures we have.

INSTRUCTIONS
  • First we will define a function get_email_structure:

    def << your code goes here >>(email):
        if isinstance(email, str):
            return email
        payload = email.get_payload()
        if isinstance(payload, list):
            return "multipart({})".format(", ".join([
                get_email_structure(sub_email)
                for sub_email in payload
            ]))
        else:
            return email.get_content_type()
    

    The get_payload function returns the current payload, which will be a list of Message objects when is_multipart() is True, or a string when is_multipart() is False. If the payload is a list and you mutate the list object, you modify the message’s payload in place.

  • Next, we store the structures of the email by creating another function called structures_counter:

    from collections import Counter
    
    def << your code goes here >>(emails):
        structures = Counter()
        for email in emails:
            structure = get_email_structure(email)
            structures[structure] += 1
        return structures
    

    The collections module provides alternatives to built-in container data types such as list, tuple and dict. What it does is that it creates a collection of the structures of the emails and stores them in the structures variable.

  • And finally, we will check the most common type of email structures for both ham and spam mails:

    structures_counter(ham_emails).most_common()
    
    structures_counter(spam_emails).most_common()
    

    most_common() return a list of the n most common elements and their counts from the most common to the least. If n is omitted or None, most_common() returns all elements in the counter. Elements with equal counts are ordered arbitrarily.

    Based on the last 2 outputs, please answer the next questions.

Get Hint See Answer


Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...