One-on-one discussion on Gradient Descent

Usually, the learners from our classes schedule 1-on-1 discussions with the mentors to clarify their doubts. So, thought of sharing the video of one of these 1-on-1 discussions that one of our CloudxLab learner – Leo – had with Sandeep last week.

Below are the questions from the same discussion.

You can go through the detailed discussion which happened around these questions, in the attached video below.

One-on-one discussion with Sandeep on Gradient Descent

Q.1. In the Life Cycle of Node Value chapter, please explain me the below line of code. What are zz_v, z_v and y_v values ?

Ans.1. The complete code looks like below:[zz, z, y]) expression above is basically evaluating zz, z and y, and is returning their evaluated values, which we are storing in variables zz_v, z_v and y_v variables respectively.  Basically, the the run function of session object returns the same data type as passed in the first argument. Here, the run method is returning an array of values.

TensorFlow is a Python library and in Python, we can return multiple values from a function in the form of a tuple. Here is a simple example:

Same thing is happening here, is returning multiple values which are being stored in variables zz_v, z_v and y_v respectively.

So, the evaluated value of zz is stored in variable zz_v, evaluated value of variable z in z_v and evaluated value of variable y in variable y_z.

Q.2. In Linear Regression chapter, we are using housing price dataset. How do we know that the model for housing price dataset is a linear equation ? Is it just an assumption ?

Ans.2. Yes, it is an assumption that model for housing price dataset is a linear equation.

We can use linear equation for a non-linear problem also.

We convert most of the non-linear problems into a linear problem by using polynomial features.

Even Polynomial Regression problem is solved using Linear Regression by converting a non-linear problem to linear problem by adding polynomial features.

Suppose your equation is

where x1 and x2 are polynomial features and

ϴ0, ϴ1, ϴ2, …. etc are weights or also called coefficients.

In Linear Regression, when Gradient Descent is applied on this equation, weights ϴ1 and ϴ3 will go down to 0 (zero) and weight ϴ2 will become bigger. Hence, at the end of Gradient Descent, our above equation will look like below i.e. we get a non-linear equation

Q.3. Equations of Gradient and Gradient Descent, I don’t understand them

Equation for Gradient for Linear Regression


The below equation is for calculating the Gradient

Equation for Gradient for Linear Regression

MSE is ‘Mean Squared Error’

m is total number of instances.

X dataset is a matrix with ‘n’ columns (features) and ‘m’ rows (instances).

y is a vector (containing  actual values of label) with ‘m’ rows and 1 column

y^ is a also a vector (containing predicted values of label) with ‘m’ rows and 1 column

Therefore, we get,

Equation for Gradient for Linear Regression

Below equation is for calculating the Gradient Descent

Gradient Descent Equation for Linear Regression

η  is the learning rate here.

ϴ is an array of theta values.

 is also an array of values, and is called the Gradient or the rate of change of error (E).

If the Gradient increases, we need to decrease the ϴ, and if the Gradient decreases, we need to increase the ϴ. Eventually, we need to move towards making the Gradient equal to 0 (zero) or nearly 0.

Q.4. In Gradient Descent, what we can do to avoid getting stuck in local minima ?

Ans.4. You can use Stochastic Gradient Descent to avoid getting stuck in local minima.

You can find more details about this in our Machine Learning course.

For the complete course on Machine Learning, please visit Specialization Course on Machine Learning & Deep Learning

Use-cases of Machine Learning in E-Commerce

What computing did to the usual industry earlier, Machine Learning is doing the same to usual rule-based computing now. It is eating the market of the same. Earlier, in organizations, there used to be separate groups for Image Processing, Audio Processing, Analytics and Predictions. Now, these groups are merged because machine learning is basically overlapping with every domain of computing. Let us discuss how machine learning is impacting e-commerce in particular.

The first use case of Machine Learning that became really popular was Amazon Recommendations. Afterwards, the Netflix launched a challenge of Movie Recommendations which gave birth to Kaggle, now an online platform of various machine learning challenges.

Before I dive deep into the details further, lets quickly brief the terms that are found often confusing. AI stands for Artificial Intelligence which means being able to display human-like intelligence. AI is basically an objective. Machine learning is making computers learn based on historical or empirical data instead of explicitly writing the rules. Artificial Neural networks are the computing constructs designed on a similar structure like the animal brain. Deep Learning is a branch of machine learning where we use a complex Artificial Neural network for predictions.

Continue reading “Use-cases of Machine Learning in E-Commerce”

How To Optimise A Neural Network?

When we are solving an industry problem involving neural networks, very often we end up with bad performance. Here are some suggestions on what should be done in order to improve the performance.

Is your model underfitting or overfitting?

You must break down the input data set into two parts – training and test. The general practice is to have 80% for training and 20% for testing.

You should train your neural network with the training set and test with the testing set. This sounds like common sense but we often skip it.

Compare the performance (MSE in case of regression and accuracy/f1/recall/precision in case of classification) of your model with the training set and with the test set.

If it is performing badly for both test and training it is underfitting and if it is performing great for the training set but not test set, it is overfitting.

Continue reading “How To Optimise A Neural Network?”