Spam Classifier - Split the Dataset

Before we start learning about the data, let's split it into a training set and a test set.

INSTRUCTIONS

Then, import train_test_split from sklearn:

from sklearn.model_selection import << your code goes here >>

Now let's collate the data in X and y variables:

X = np.array(ham_emails + spam_emails)
y = np.array([0] * len(ham_emails) + [1] * len(spam_emails))

Finally, let's split the dataset into a 80/20 ratio:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<< your code goes here >>, random_state=42)

Get Hint See Answer

Loading comments...