Login using Social Account
     Continue with GoogleLogin using your credentials
In this step, we will parse the emails we downloaded.
We can use Python's email
module to parse these emails (this handles headers, encoding, and so on). First, we will import the email
and email.policy
modules. The email
package is a library for managing email messages, which does not include sending emails.
The control component of the email
module is the policy
module. Every EmailMessage, every generator, and every parser has an associated policy object that controls its behavior. Usually an application only needs to specify the policy when an EmailMessage is created, either by directly instantiating an EmailMessage to create a new email, or by parsing an input stream using a parser. But the policy can be changed when the message is serialized using a generator. This allows, for example, a generic email message to be parsed from disk, but to serialize it using standard SMTP settings when sending it to an email server.
First, let us import the required modules:
import email
import email.policy
Next, we will define a function load_email
which does exactly what it sounds like, it loads the emails for parsing:
def << your code goes here >>(is_spam, filename, spam_path=SPAM_PATH):
directory = "spam" if is_spam else "easy_ham"
with open(os.path.join(spam_path, directory, filename), "rb") as f:
return email.parser.BytesParser(policy=email.policy.default).parse(f)
Finally, we will store only those emails whose names we had stored in the previous step:
ham_emails = [load_email(is_spam=False, filename=name) for name in ham_filenames]
spam_emails = [load_email(is_spam=True, filename=name) for name in spam_filenames]
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Loading comments...