Overfitting

So if you would have trained the model and measured the RMSE correctly, then its value will be 0.0. Also, when we compare the predicted and actual values for our first 5 data points we get-

Actual values = [286600.0, 340600.0, 196900.0,  46300.0, 254500.0]

Predicted values = [286600.0, 340600.0, 196900.0,  46300.0, 254500.0]

So our RMSE comes as zero and our predicted values are equal to the actual values. Have we achieved a perfect model? Obviously not because there is no such thing as a perfect model. This means that our model is most likely overfitting.

Overfitting happens when our model performs very well on our training data but fails to perform well on our testing data or the real world. This is because it memorizes the training data so well that it predicts the value very close to the actual values. It even finds patterns in the noise and hence fails to perform well when some other data is given.

So, how to check whether our model is overfitting or not as it is not advisable to touch the test set until we are confident about our model. So, here comes the validation set in play.

What we do is that we separate some part of the training data and use it as validation data. Then we train our model again on leftover training data and evaluate our model on validation data. This is known as Cross Validation

So, if our model fails to perform well on validation data, then it's profitable to bet on overfitting.

Note- If a model performs significantly better on the training data than on the testing data, it can be overfitting on the training data. But we can't be sure that it's always overfitting because this scenario can also arise due to some other problems such as data mismatch on training and test set. It means the test set contains different type of data(having different distribution) which was not there in the training data. It can also be the case that you are observing this because of the stochastic nature of the algorithm. So we have to check whether the model is really overfitting or if it is suffering from some other problem.

Previous Index Next

End-to-End ML Project- Beginner friendly

Overfitting

XP

Loading comments...