One-on-one discussion on Gradient Descent

Usually, the learners from our classes schedule 1-on-1 discussions with the mentors to clarify their doubts. So, thought of sharing the video of one of these 1-on-1 discussions that one of our CloudxLab learner – Leo – had with Sandeep last week.

Below are the questions from the same discussion.

You can go through the detailed discussion which happened around these questions, in the attached video below.

One-on-one discussion with Sandeep on Gradient Descent

Q.1. In the Life Cycle of Node Value chapter, please explain me the below line of code. What are zz_v, z_v and y_v values ?

Ans.1. The complete code looks like below:[zz, z, y]) expression above is basically evaluating zz, z and y, and is returning their evaluated values, which we are storing in variables zz_v, z_v and y_v variables respectively.  Basically, the the run function of session object returns the same data type as passed in the first argument. Here, the run method is returning an array of values.

TensorFlow is a Python library and in Python, we can return multiple values from a function in the form of a tuple. Here is a simple example:

Same thing is happening here, is returning multiple values which are being stored in variables zz_v, z_v and y_v respectively.

So, the evaluated value of zz is stored in variable zz_v, evaluated value of variable z in z_v and evaluated value of variable y in variable y_z.

Q.2. In Linear Regression chapter, we are using housing price dataset. How do we know that the model for housing price dataset is a linear equation ? Is it just an assumption ?

Ans.2. Yes, it is an assumption that model for housing price dataset is a linear equation.

We can use linear equation for a non-linear problem also.

We convert most of the non-linear problems into a linear problem by using polynomial features.

Even Polynomial Regression problem is solved using Linear Regression by converting a non-linear problem to linear problem by adding polynomial features.

Suppose your equation is

where x1 and x2 are polynomial features and

ϴ0, ϴ1, ϴ2, …. etc are weights or also called coefficients.

In Linear Regression, when Gradient Descent is applied on this equation, weights ϴ1 and ϴ3 will go down to 0 (zero) and weight ϴ2 will become bigger. Hence, at the end of Gradient Descent, our above equation will look like below i.e. we get a non-linear equation

Q.3. Equations of Gradient and Gradient Descent, I don’t understand them

Equation for Gradient for Linear Regression


The below equation is for calculating the Gradient

Equation for Gradient for Linear Regression

MSE is ‘Mean Squared Error’

m is total number of instances.

X dataset is a matrix with ‘n’ columns (features) and ‘m’ rows (instances).

y is a vector (containing  actual values of label) with ‘m’ rows and 1 column

y^ is a also a vector (containing predicted values of label) with ‘m’ rows and 1 column

Therefore, we get,

Equation for Gradient for Linear Regression

Below equation is for calculating the Gradient Descent

Gradient Descent Equation for Linear Regression

η  is the learning rate here.

ϴ is an array of theta values.

 is also an array of values, and is called the Gradient or the rate of change of error (E).

If the Gradient increases, we need to decrease the ϴ, and if the Gradient decreases, we need to increase the ϴ. Eventually, we need to move towards making the Gradient equal to 0 (zero) or nearly 0.

Q.4. In Gradient Descent, what we can do to avoid getting stuck in local minima ?

Ans.4. You can use Stochastic Gradient Descent to avoid getting stuck in local minima.

You can find more details about this in our Machine Learning course.

For the complete course on Machine Learning, please visit Specialization Course on Machine Learning & Deep Learning

Top Machine Learning Interview Questions for 2018 (Part-1)


These Machine Learning Interview Questions, are the real questions that are asked in the top interviews.

For hiring machine learning engineers or data scientists, the typical process has multiple rounds.

  1. A basic screening round – The objective is to check the minimum fitness in this round.
  2. Algorithm Design Round – Some companies have this round but most don’t. This involves checking the coding / algorithmic skills of the interviewee.
  3. ML Case Study – In this round, you are given a case study problem of machine learning on the lines of Kaggle. You have to solve it in an hour.
  4. Bar Raiser / Hiring Manager  – This interview is generally with the most senior person in the team or a very senior person from another team (at Amazon it is called Bar raiser round) who will check if the candidate fits in the company-wide technical capabilities. This is generally the last round.

Continue reading “Top Machine Learning Interview Questions for 2018 (Part-1)”

How To Optimise A Neural Network?

When we are solving an industry problem involving neural networks, very often we end up with bad performance. Here are some suggestions on what should be done in order to improve the performance.

Is your model underfitting or overfitting?

You must break down the input data set into two parts – training and test. The general practice is to have 80% for training and 20% for testing.

You should train your neural network with the training set and test with the testing set. This sounds like common sense but we often skip it.

Compare the performance (MSE in case of regression and accuracy/f1/recall/precision in case of classification) of your model with the training set and with the test set.

If it is performing badly for both test and training it is underfitting and if it is performing great for the training set but not test set, it is overfitting.

Continue reading “How To Optimise A Neural Network?”