Training Models

4 / 57

Machine Learning Training Models Part-4






Slides

Download the slides


Please login to comment

22 Comments

In L1 vs. L2 , in the slide it says that L1 leads to more sparse models, but we know that L2 is more affected by outliers, both of these statements seems counterintuitive , could you please explain how L1 leads to more sparse models and why L2 is preferred over it usually

  Upvote    Share

Hi Abhishek,

L1 regularization (Lasso) encourages sparsity by driving some coefficients to zero, effectively performing feature selection, but is more sensitive to outliers due to its absolute value penalty. L2 regularization (Ridge) shrinks the coefficients of less informative features towards zero without forcing them exactly to zero, leading to smoother solutions and better handling of multicollinearity, making it more robust to outliers. While L1 regularization tends to create sparser models, L2 regularization is generally preferred in many cases due to its stability, better generalization to unseen data, and overall predictive performance, especially when feature selection is not the primary objective or when the dataset is not highly sparse.

  Upvote    Share

In the learning curve part, the code for the plot of RMSE vs training set for validation and training dataset, the y_train_predict dataset size is changing as per iteration variable m X_train[:m], but the y_val_predict is making predictions X_val, whose size is constant( 20% of the dataset as per code). So why is its prediction changing and its curve should be constant or approximately constant, it shouldn't change as size is constant?

  Upvote    Share

Hi,

This is because we are not plotting the size but the RMSE against size. So even though the size remains constant, the RMSE changes, which is why the graph is fluctuating.

Thanks.

  Upvote    Share

1) Topic name "Machine Learning Training Models Part-4" does not match with the name of the YT video. Same problem with the next topic as well. Pl correct.

2) Display slide # in the ppt.

  Upvote    Share

Hi,

1. This topic is of training models, the video added in this topic is also called training models. Could you please tell me where did you notice the difference so that we can rectify it if required?

2. Thanks for the feedback. We will definitely consider this when we are updating our courseware.

Thanks.

  Upvote    Share

Referring to slide 198 "One way to improve an overfitting model is to; Feed it more training data".

Should not it be not to feed lot of training data when a model turn out to be overfitting?

  Upvote    Share

Hi,

Overfitting is not caused by more data, it is cause by the model trying too closely to fit to the dataset. When we feed more data, the training model get a better chance to train itself.

Thanks.

  Upvote    Share

Sir,

In the Ridge Regression, why there is an extra 1/2 multiplied in the Regularization term?

  Upvote    Share

Hi,

Good question. You can find more details on the 1/2 term from the below link:

https://math.stackexchange.com/questions/884887/why-divide-by-2m

Thanks.

  Upvote    Share

Hello,

In the ridge regression, we have calculated for different alphas and graphed the regression line. In the graph i could see the y interceptor i.e teta0 is changing where as in the presentation it is mentioned that it will not change. Can you please explain this contradiction?

  Upvote    Share

Hi,

Apologies for the late reply, could you please refer me to the slide where it is mentioned that the theta will not change.

Thanks.

  Upvote    Share

what does constraining mean??

  Upvote    Share

Hi,

Constraining means to force something in following a particular course of action.

Thanks.

  Upvote    Share

as on, variance/Bias tradeoff page, By increasing a models complexity, you mean?

   changing its weights?

   increasing polynomial degree?

   changing algorithm

  Upvote    Share

Hi,

One of the ways to increase the model's complexity is by increasing the polynomial degree. Thus after, retraining, this may obviously lead to a change in weights.

Thanks.

  Upvote    Share

hello sir,

In the learning curve we took the linear regression on the data X which we created as non linear to understand the polynomial regression. shouldn't we take X_poly which is created for non linear data to fit into the linear regression, so that we can see weither the model is overfitting or underfitting. Why we took the X instead of X_poly. Are we just understanding the concept of overfitting and underfitting by taking a random model? if is it so then what will happen when we take the new X_poly data??

  Upvote    Share

Hi,

Could you please tell me which part of te video or the slide you are referring to?

Thanks.

  Upvote    Share

Hi Team,

Can you please explain why the figure 1 is overfitting whereas figure 2 graph is underfitting.

      

                          Figure 1                                                                  Figure 2

I am absolutely fine with the explanination on all the different parts of graph and I have gone through entire lecture multiple times but the tail of the graph somehow confused me how one is underfitting and other is overfitting.

Thanks,

Birendra Singh

  Upvote    Share

Hi,

Let's talk about the second figure first where it is underfitting. First, let’s look at the performance on the training data: when there are just one or two instances in the training set, the model can fit them perfectly, which is why the curve starts at zero. But as new instances are added to the training set, it becomes impossible for the model to fit the training data perfectly, both because the data is noisy and because it is not linear at all. So the error on the training data goes up until it reaches a plateau, at which point adding new instances to the training set doesn’t make the average error much better or worse. Now let’s look at the performance of the model on the validation data. When the model is trained on very few training instances, it is incapable of generalizing properly, which is why the validation error is initially quite big. Then as the model is shown more training examples, it learns and thus the validation error slowly goes down. However, once again a straight line cannot do a good job modeling the data, so the error ends up at a plateau, very close to the other curve. These learning curves are typical of an underfitting model. Both curves have reached a plateau; they are close and fairly high.

Now, coming back to the first figure. These learning curves look a bit like the previous ones, but there are two very important differences:
• The error on the training data is much lower than with the Linear Regression model.
• There is a gap between the curves. This means that the model performs significantly better on the training data than on the validation data, which is the hallmark
of an overfitting model. However, if you used a much larger training set, the two curves would continue to get closer.

Thanks.

  Upvote    Share

Hi Rajtilak

Thanks for the explaination. Please correct my understanding if wrong:

  • By stating thiis,there is a gap between the curves. This means that the model performs significantly better on the training data than on the validation data. --- Do we mean the up and downs observed in the curve, because gap between curves is confusing as both graphs has gaps between the curves.
  • In both the figure curves are getting closer to each other. So, what is the deciding factor here for overfitting and underfitting ? Does only RMSE is the basis of classifying it into overfitting and underfitting. My previous post was on this part only. Is there any threshold we assumed for the RMSE value of traning and validation set to categorize in overfitting or underfitting ?

Thanks,

Birendra Singh

  Upvote    Share

Hi,

Would request you to take another close look at these graphs, does the gap looks similar? Also, there are no thresholds, every problem in Machine Learning/Deep Learning is different. We often say that there is no one-solution-fits-all in Machine Learning/Deep Learning.

Thanks.

  Upvote    Share