Flash Sale: Flat 70% + Addl. 25% Off on all Courses | Use Coupon DS25 in Checkout | Offer Expires In

  Enroll Now

Building Spam Classifier

2 / 27
   

Spam Classifier - About the Spam Dataset

Here we will introduce the dataset we will be using for this project.

The dataset used in this project is from Apache SpamAssassin.

Apache SpamAssassin is the #1 Open Source anti-spam platform giving system administrators a filter to classify email and block spam (unsolicited bulk email).

It uses a robust scoring framework and plug-ins to integrate a wide range of advanced heuristic and statistical analysis tests on email headers and body text including text analysis, Bayesian filtering, DNS blocklists, and collaborative filtering databases.

Apache SpamAssassin is a project of the Apache Software Foundation (ASF). You can find more about them from the below link:

https://spamassassin.apache.org/

The dataset we will be using is hosted at the below link:

http://spamassassin.apache.org/old/publiccorpus/

  • IMPORTANT: Please run the following command on a web console before starting off with the project, or if you are getting a 404: Not found error on the right side:

    rsync -avz --ignore-existing /cxldata/cloudxlab_jupyter_notebooks/ /home/$USER/cloudxlab_jupyter_notebooks/
    

Let us begin!


No hints are availble for this assesment

Answer is not availble for this assesment

Loading comments...