Training Models

4 / 57

Previous Index Next

Machine Learning Training Models Part-4

Slides

Download the slides

Previous Index Next

Please login to comment

22 Comments

Abhishek Baid

a year ago

In L1 vs. L2 , in the slide it says that L1 leads to more sparse models, but we know that L2 is more affected by outliers, both of these statements seems counterintuitive , could you please explain how L1 leads to more sparse models and why L2 is preferred over it usually

Shubh Tripathi

a year ago

Hi Abhishek,

L1 regularization (Lasso) encourages sparsity by driving some coefficients to zero, effectively performing feature selection, but is more sensitive to outliers due to its absolute value penalty. L2 regularization (Ridge) shrinks the coefficients of less informative features towards zero without forcing them exactly to zero, leading to smoother solutions and better handling of multicollinearity, making it more robust to outliers. While L1 regularization tends to create sparser models, L2 regularization is generally preferred in many cases due to its stability, better generalization to unseen data, and overall predictive performance, especially when feature selection is not the primary objective or when the dataset is not highly sparse.

Anush Kapoor

4 years ago

In the learning curve part, the code for the plot of RMSE vs training set for validation and training dataset, the y_train_predict dataset size is changing as per iteration variable m X_train[:m], but the y_val_predict is making predictions X_val, whose size is constant( 20% of the dataset as per code). So why is its prediction changing and its curve should be constant or approximately constant, it shouldn't change as size is constant?

Rajtilak Bhattacharjee

4 years ago

Hi,

This is because we are not plotting the size but the RMSE against size. So even though the size remains constant, the RMSE changes, which is why the graph is fluctuating.

Thanks.

Narasimha Murthy N

4 years ago

1) Topic name "Machine Learning Training Models Part-4" does not match with the name of the YT video. Same problem with the next topic as well. Pl correct.

2) Display slide # in the ppt.

Rajtilak Bhattacharjee

4 years ago

Hi,

1. This topic is of training models, the video added in this topic is also called training models. Could you please tell me where did you notice the difference so that we can rectify it if required?

2. Thanks for the feedback. We will definitely consider this when we are updating our courseware.

Thanks.

Narasimha Murthy N

4 years ago

Referring to slide 198 "One way to improve an overfitting model is to; Feed it more training data".

Should not it be not to feed lot of training data when a model turn out to be overfitting?

Rajtilak Bhattacharjee

4 years ago

Hi,

Overfitting is not caused by more data, it is cause by the model trying too closely to fit to the dataset. When we feed more data, the training model get a better chance to train itself.

Thanks.

Debkrishna Manna

4 years ago

Sir,

In the Ridge Regression, why there is an extra 1/2 multiplied in the Regularization term?

Rajtilak Bhattacharjee

4 years ago

Hi,

Good question. You can find more details on the 1/2 term from the below link:

https://math.stackexchange.com/questions/884887/why-divide-by-2m

Thanks.

Sekar Mp

4 years ago

Hello,

In the ridge regression, we have calculated for different alphas and graphed the regression line. In the graph i could see the y interceptor i.e teta0 is changing where as in the presentation it is mentioned that it will not change. Can you please explain this contradiction?

Rajtilak Bhattacharjee

4 years ago

Hi,

Apologies for the late reply, could you please refer me to the slide where it is mentioned that the theta will not change.

Thanks.

Nirav Raj

4 years ago

what does constraining mean??

Rajtilak Bhattacharjee

4 years ago

Hi,

Constraining means to force something in following a particular course of action.

Thanks.

Manjari Singh

5 years ago

as on, variance/Bias tradeoff page, By increasing a models complexity, you mean?

changing its weights?

increasing polynomial degree?

changing algorithm

Vagdevi K

5 years ago

Hi,

One of the ways to increase the model's complexity is by increasing the polynomial degree. Thus after, retraining, this may obviously lead to a change in weights.

Thanks.

jayshree rathod

5 years ago

hello sir,

In the learning curve we took the linear regression on the data X which we created as non linear to understand the polynomial regression. shouldn't we take X_poly which is created for non linear data to fit into the linear regression, so that we can see weither the model is overfitting or underfitting. Why we took the X instead of X_poly. Are we just understanding the concept of overfitting and underfitting by taking a random model? if is it so then what will happen when we take the new X_poly data??

Rajtilak Bhattacharjee

5 years ago

Hi,

Could you please tell me which part of te video or the slide you are referring to?

Thanks.

Birendra Singh

5 years ago

Hi Team,

Can you please explain why the figure 1 is overfitting whereas figure 2 graph is underfitting.

Figure 1 Figure 2

I am absolutely fine with the explanination on all the different parts of graph and I have gone through entire lecture multiple times but the tail of the graph somehow confused me how one is underfitting and other is overfitting.

Thanks,

Birendra Singh

Rajtilak Bhattacharjee

5 years ago

Hi,

Let's talk about the second figure first where it is underfitting. First, let’s look at the performance on the training data: when there are just one or two instances in the training set, the model can fit them perfectly, which is why the curve starts at zero. But as new instances are added to the training set, it becomes impossible for the model to fit the training data perfectly, both because the data is noisy and because it is not linear at all. So the error on the training data goes up until it reaches a plateau, at which point adding new instances to the training set doesn’t make the average error much better or worse. Now let’s look at the performance of the model on the validation data. When the model is trained on very few training instances, it is incapable of generalizing properly, which is why the validation error is initially quite big. Then as the model is shown more training examples, it learns and thus the validation error slowly goes down. However, once again a straight line cannot do a good job modeling the data, so the error ends up at a plateau, very close to the other curve. These learning curves are typical of an underfitting model. Both curves have reached a plateau; they are close and fairly high.

Now, coming back to the first figure. These learning curves look a bit like the previous ones, but there are two very important differences:
• The error on the training data is much lower than with the Linear Regression model.
• There is a gap between the curves. This means that the model performs significantly better on the training data than on the validation data, which is the hallmark
of an overfitting model. However, if you used a much larger training set, the two curves would continue to get closer.

Thanks.

Birendra Singh

5 years ago

Hi Rajtilak

Thanks for the explaination. Please correct my understanding if wrong:

By stating thiis,there is a gap between the curves. This means that the model performs significantly better on the training data than on the validation data. --- Do we mean the up and downs observed in the curve, because gap between curves is confusing as both graphs has gaps between the curves.
In both the figure curves are getting closer to each other. So, what is the deciding factor here for overfitting and underfitting ? Does only RMSE is the basis of classifying it into overfitting and underfitting. My previous post was on this part only. Is there any threshold we assumed for the RMSE value of traning and validation set to categorize in overfitting or underfitting ?

Thanks,

Birendra Singh

Rajtilak Bhattacharjee

5 years ago

Hi,

Would request you to take another close look at these graphs, does the gap looks similar? Also, there are no thresholds, every problem in Machine Learning/Deep Learning is different. We often say that there is no one-solution-fits-all in Machine Learning/Deep Learning.

Thanks.