Project - Building Spam Classifier

Welcome to this project on the Spam Classifier Project with Logistic Regression Classifier using scikit-learn. In this project, you will use Python and scikit-learn to build a Logistic Regression Classifier, and apply it to predict whether an email is Spam or Ham.

The world is full of textual data being generated at a very rapid pace each second. The most important data preprocessing steps include accessing and cleansing the real-time data, transforming it to get a refined form, and making it in an ML-algorithm compatible way by representing the textual data into numerical form. You will learn to achieve all these data preprocessing steps using NLTK - a famous Natural Language Processing API - in conjunction with Python. You will build data transformers and use them in scikit-learn pipelines in order to effectively preprocess the data. Finally, you will build a Logistic Regression Classifier to predict the class of an email.

Skills you will develop:

  1. Textual Data Preprocessing
  2. Data Preprocessing Pipelines
  3. Data Transforming
  4. NLTK
  5. Python Programming
  6. Predictive Modeling
  7. Machine Learning
  8. scikit-Learn


Machine Learning Engineer @ CloudxLab