Project - Building Spam Classifier

1 / 27

Spam Classifier - Step 1- Get the SpamAssassin Dataset from their Website

First, we would download the dataset from the SpamAssassin website.

enter image description here


No hints are availble for this assesment

Answer is not availble for this assesment

Please login to comment

5 Comments

import os
import tarfile
import urllib
DOWNLOAD_ROOT = "http://spamassassin.apache.org/old/publiccorpus/"
HAM_URL = DOWNLOAD_ROOT + "20030228_easy_ham.tar.bz2"
SPAM_URL = DOWNLOAD_ROOT + "20030228_spam.tar.bz2"
SPAM_PATH = os.path.join("datasets", "spam")
def fetch_spam_data(spam_url=SPAM_URL,spam_path=SPAM_PATH):
    if not os.path.isdir(spam_path):
        os.makedirs(spam_path)
    for filename, url in (("ham.tar.bz2", HAM_URL), ("spam.tar.bz2", SPAM_URL)):
        path = os.path.join(spam_path, filename)
        if not os.path.isfile(path):
            urllib.request.urlretrieve(url, path)
        tar_bz2_file = tarfile.open(path)
        tar_bz2_file.extractall(path=SPAM_PATH)
        tar_bz2_file.close()
        calculate()
        HAM_DIR = os.path.join(SPAM_PATH, "easy_ham")
        SPAM_DIR = os.path.join(SPAM_PATH, "spam")
        ham_filenames = [name for name in sorted(os.listdir(HAM_DIR)) if len(name) > 20]
        spam_filenames = [name for name in sorted(os.listdir(SPAM_DIR)) if len(name) > 20]

when i  submitted the code i got error soam_filenames not defined . Can you please guide sir where i am wrong

  Upvote    Share

Hi,

Please attach the screenshot of your issue.

Thanks.

  Upvote    Share

Hi, I am not able to access lab. Console is not showing anything while typing password. Also jupyter notebook shows invalid credentials when trying to login

  Upvote    Share

Hi,

I have reset your password. Please note that for security purposes, console does not show typed passwords. This is a feature of Linux. Please try once again with your new password.

Thanks.

  Upvote    Share

This comment has been removed.