- Home
- Assessment

59 / 95

Now let's compute the correlation coefficient. Please refer to the above video for understanding what is correlation.

We find the correlation between the attributes to understand the relationship between the variables so we can preprocess and model the data better. Also it helps us to identify if there are any data quirks. We remove the data quirks before feeding the data to an algorithm. We can also check for multicollinearity by it. We'll study about multicollinearity in the next chapter.

**Note-** There is a common misconception that new learners believe that the positive correlation is greater than the negative correlation. For example, they believe a correlation coefficient of 0.98 is greater than -0.98. It is totally incorrect. Sign only tells us about the direction of correlation. That is, positive correlation means values increase together while in the negative correlation, one value decreases as the other value increases. So, two variables with a correlation coefficient of 0.98 are the same strongly correlated as those of -0.98 differing in only the direction of correlation.

Correlation indicates the extent to which two or more variables fluctuate together. It often refers to how close two variables are having a linear relationship with each other. Two variables may have positive, negative or no correlation at all.

Positive correlation means values increase together. If one value increases then the other value also increases. Chart on the left shows a perfect positive correlation. As the value on the x-axis increases, the corresponding value on the y-axis also increases. In the perfect positive correlation, if you draw a straight line then all the points will be on the straight line.

In perfect positive correlation, the correlation coefficient will always be one.The correlation coefficient is a statistical measure that calculates the strength of the relationship between the two variables.

Chart on the middle shows high positive correlation. It’s correlation coefficient is 0.9. As the value on x-axis increases, the corresponding value on y-axis may or may not increase for some of the points.

Chart on the right shows low positive correlation. It’s correlation coefficient is 0.5. If you draw straight line here, many of the points will be off the line.

When two variables are not linked at all then we say variables are not correlated. In this diagram, there is no pattern. We can not clearly say if one value is increasing or decreasing when the other value increases. Correlation coefficient zero indicates there is no correlation.

In the negative correlation, one value decreases as the other value increases. Chart on the left shows perfect negative correlation. As the value on the x-axis increases, the corresponding value on y-axis decreases. In the perfect negative correlation, if you draw a straight line then all the points will be on the straight line. In perfect negative correlation, the correlation coefficient will always be minus one.

The charts on the middle and right show less negative correlation than chart on the left. Their correlation coefficient is minus 0.9 and minus 0.5 respectively.

The correlation coefficient always takes values in the range of minus one to plus one.

Correlation coefficient with value of minus one shows perfect negative correlation.

The zero value shows there is no correlation.

And plus one shows there is a perfect positive correlation

Taking you to the next exercise in seconds...

Want to create exercises like this yourself? Click here.

XP

## Loading comments...