Create a test set

While building a Machine Learning model, we always split our dataset into train and test sets. We train our model using the training dataset and after training is completed, we test our model's performance using the test dataset. We do that to check how our model will perform in the real world because the test set will be unseen data to our model and so will behave like the real-world data.

This step is generally performed before visualizing our dataset. This is because if you’ll look at the test set, then you may identify some patterns and choose a machine learning model according to the pattern. So, when you'll test the model's performance on test data, it will perform well there. But, when you'll launch the model in the real world, it may not perform well there. This is called data snooping bias.

So we perform the split before exploring the dataset so that we don't get any idea of the test data and our test data behaves like real-world (unseen) data.

We generally split our dataset in an 80:20 train-test ratio. It means we put 80% of our instances in the training dataset and 20% in the testing dataset.

Previous Index Next

End-to-End ML Project- Beginner friendly

Create a test set

XP

Loading comments...