Project - Building Spam Classifier

14 / 27

Spam Classifier - Split the Dataset

Before we start learning about the data, let's split it into a training set and a test set.

INSTRUCTIONS
  • First, let's import Numpy:

    import << your code goes here >> as np
    
  • Then, import train_test_split from sklearn:

    from sklearn.model_selection import << your code goes here >>
    
  • Now let's collate the data in X and y variables:

    X = np.array(ham_emails + spam_emails)
    y = np.array([0] * len(ham_emails) + [1] * len(spam_emails))
    
  • Finally, let's split the dataset into a 80/20 ratio:

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<< your code goes here >>, random_state=42)
    
Get Hint See Answer


Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...