Training Models

2 / 57

Machine Learning Training Models Part-2


Slides

Download the slides


No hints are availble for this assesment

Answer is not availble for this assesment

Please login to comment

48 Comments

Long video session seems to be boring. Better to have interactive exercises with examples. Please take it as constructive criticism. 

  Upvote    Share

Hi Arun,

We appreciate your feedback and couldn't agree more with the importance of interactive learning. We share your belief that it's much more engaging and enjoyable to learn through hands-on exercises and practical examples. That's why we have carefully curated a mix of content and interactive exercises, placing them strategically to enhance your learning experience. Thank you for your constructive criticism, as it helps us continuously improve and provide the best learning environment for our users.

  Upvote    Share

Hi,

@slide 170

Formula says there will be 10 new featuers but there are only 9 features are mentioned. what is 10th feature? 

 1  Upvote    Share

Hi,

Here are the 10 features:

1, a, b, a^2, a^3, b^2, b^3, ab, a^2b, ab^2

You can go through the below link for more explanation:

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html

Thanks.

  Upvote    Share

Sir, I have two queries - 

1. For SGD or mini-batch, how can we be so sure that at each iteration it is minimizing the cost function? Because, for these methods we are not taking all of the observations. 

2. Can you suggest any books for further reading of all gradient decent methods?

 

  Upvote    Share

Hi,

1. The best way is to observe the result after each iteration because there is no guarentee that SGD will minimize the cost function. For example, if the learning rate is too high it will not converge.

2. I am not aware of any book which focuses on all gradient descent methods, however, I personally prefer the following book for a detailed study on optimization methods:

Algorithms for Optimization by Mykel J. Kochenderfer, Tim A. Wheeler

Let me know if you find it useful.

Thanks.

  Upvote    Share

Thank you sir for your valuable information. I have downloaded the book. I'll let youknow if I find anything insightful about the algorithems.

 1  Upvote    Share

Can we build polynomial models using OLS method?

  Upvote    Share

Hi,

Yes you can.

Thanks.

  Upvote    Share

can you please provide the link of the jupitor notebook page that has been used for SGD?

  Upvote    Share

Hi,

Please find below the link to your GitHub repository:

https://github.com/cloudxlab/ml

Within the Machine Learning folder, you will find the Jupyter notebook for Training Models.

Thanks.

  Upvote    Share

Hi.

In slide 108, in the diagram where the learning rate is 0.1, why and how does the line (algorithm) change it's length of jumps ( lessens it ) as it approaches optimal solution of minimal RSME?

  Upvote    Share

Hi,

It does not, we do not need to change the learning rate in real-case scenario. However, we have changed it here manually to show the cause and effect.

Thanks.

  Upvote    Share

Which book is preferable for this course

  Upvote    Share

Hi,

Here is a list of ML/DL books that you can choose from:

https://cloudxlab.com/blog/gigantic-list-of-machine-learning-books/

Thanks.

  Upvote    Share

when to use rmse and when mse?

what is the code for RMSE? any library?

  Upvote    Share

Hi,

The smaller the Mean Squared Error, the closer the fit is to the data. The MSE has the units squared of whatever is plotted on the vertical axis. The RMSE is directly interpretable in terms of measurement units, and so is a better measure of goodness of fit than a correlation coefficient.

Would request you to go through the tutorial for details.

Thanks.

  Upvote    Share

This comment has been removed.

This comment has been removed.

Hi,

As mentioned in the slides, m is the number instances in the training dataset.

Thanks.

  Upvote    Share

Dear sir, i cannot find the notebook file "training_linear_models.ipynb" in the shown path. How to get that? Please help

  Upvote    Share

Hi Tathagata,

You can find the notebooks here: https://github.com/cloudxlab/ml

 1  Upvote    Share

Hello sirs,

I had one question. Does the normal equation take into account bias? if not how would one optimize it without gradient descent?

Thank you for your time

 1  Upvote    Share

Hi,

Would request you to go over through slide# 56 onwards for an explanation of the same.

Thanks.

 1  Upvote    Share

Different problems/ apllications will have different learning rate........ How fo we determine the closet/ perfect lerning rate for different problems??? Is it based on trial and error ????

 1  Upvote    Share

got it

 1  Upvote    Share

Hi Team,

In below sample code:

from sklearn.preprocessing import PolynomialFeatures
poly_features = PolynomialFeatures(degree=3, include_bias=False)
X_poly = poly_features.fit_transform(X)
X #Original features
X_poly #Original plus new feature

Doubts:

  • We have only one variable 'x' so it is having 1 feature right ?
  • Equation used was, y = 0.5 * X**2 + X + 2 + np.random.randn(m, 1) -- So, the degres is 2 right ?
  • poly_features = PolynomialFeatures(degree=3, include_bias=False) -- Here we are mentioning degree to be greater than 2 and we are observing X_poly changing the dimension accordingly. For instance if degree is mentioned 3 then X_poly is 100x3. It is confusing me 2 degree eqation but feature calculation is done on more.Please explain this.
  • Include_bias = False, does this mean we are ignoring distance from centre ?
  • Formula of calculating number of features in slide 167 is not satisfying here. Ideally it should have 4 features but in X_poly it is showing 3 features. Is it because include_bias =False ?

 

Thanks

Birendra Singh

  Upvote    Share

Hi,

1. Yes, this has one feature. But that need not be the case, it can contain multiple features.

2. Yes, it's a 2nd degree polynomial.

3. We are changing the degree of freedom here. Please go through the lecture video for a detailed explanation.

4. We are ignoring the bias term here.

5. Yes, that's right.

Thanks.

  Upvote    Share

Hi Team,

While plotting gradient descent for various learning rates, we used theta path in 1 learning rate i.e 0.1 only. What is the purpose of theta path and why it is not used in 1st plot when learning rate is 0.02

plt.subplot(131); plot_gradient_descent(theta, eta=0.02) // no theta path
plt.subplot(132); plot_gradient_descent(theta, eta=0.1, theta_path=theta_path_bgd)  // theta path
plt.subplot(133); plot_gradient_descent(theta, eta=0.5) // no theta path

And please explain, the plot logic as well:

if iteration < 10:
            y_predict = X_new_b.dot(theta)
            style = "b-" if iteration > 0 else "r--"
            plt.plot(X_new, y_predict, style)

Thanks 

Birendra Singh

 

  Upvote    Share

Hi, Birendra. 

Answer1 :-

In code the list theta_path_bgd will be appended with all the theta claculated. 

There is also modifications of the code in below steps I corrected it:- 

        if theta_path_bgd is not None:
            theta_path_bgd.append(theta)

In the plot also plt.subplot(132); plot_gradient_descent(theta, eta=0.1)  there is no significance as we are plotting b/t theta, eta=0.1 which is directly calculated in the 

For plotting the grapg b/t the heta, eta, theta_path_bgd is redundant. It is just use to store/ collect all thetas.  There is no significance of theta_path=theta_path_bgd here. 

Answer2 :-

if iteration < 10:     Till 9th ietartions it is calculating.   
            y_predict = X_new_b.dot(theta)
            style = "b-" if iteration > 0 else "r--"     --> Here it will plot the grapg in blue color if when iterations from 1-9 if iterations value is 0 it will show in red color. Just for showing that iterations started. 
            plt.plot(X_new, y_predict, style)

 

All the best!

 

  Upvote    Share

Thanks Satyajit

  Upvote    Share

This comment has been removed.

sir,

On what basis we are selecting values of t0 and t1?

  Upvote    Share

Hi,

Could you please help me by pointing out which part of this video or the slides your query is referring to?

Thanks.

  Upvote    Share

Hi Team,

this file training_linear_models.ipynb  is not opening ......is that any problem in this file.

  Upvote    Share

Hi,

How are you trying to open the file, from the GitHub page, or from your lab?

Thanks.

  Upvote    Share

@disqus_XTh3bUKOBh:disqus Team,

In video at 2:01:18, the code at line 21 calculates the gradient.

I think, the equation should be divided by the minibatch_size variable; this being in line with what is being done for the Batch Gradient Descent where value is averaged by dividing the results by m (i.e. number of instances).

Batch Gradient Descent:
gradients = 2/m * X.T.dot(X.dot(theta) - y)

Mini-Batch Gradient Descent:
gradients = 2/minibatch_size * X.T.dot(X.dot(theta) - y)

  Upvote    Share

Hi,

You are right! Thank you for pointing this out, we would change the code in our GitHub repository shortly.

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Thanks for confirming with a quick turnaround.

It would be my pleasure to help.

  Upvote    Share

Hi
I am getting this error in code Plz help.

  Upvote    Share

Hi,

Can you try max_iter instead of n_iter and try once again.

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

yes now its working
Thanks

  Upvote    Share

How to simply define
- overfitting
- underfitting

  Upvote    Share

Hi,

You can define them as follows:

*Overfitting*: Good performance on the training data, poor generalization to other data. *Underfitting*: Poor performance on the training data and poor generalization to other data.

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Hi,

Could anyone please let me know the purpose of multiplying X_b.T in gradient calculations part????/

  Upvote    Share

and I couldn't understand why is it the (n+d)!/d!n! , what's the reasoning behind this?

  Upvote    Share

Did you say something wrong ? because you said at 2:07:36 that Batch GD doesn't have to load all the data in the memory , but the diagram you showed me said that you can't operate out of the core

  Upvote    Share