Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left

00 D 03 H : 27 M : 43 S   Apply Now

End-to-End ML Project- Beginner friendly

58 / 94

Looking for correlations






Now let's compute the correlation coefficient. Please refer to the above video for understanding what is correlation.

We find the correlation between the attributes to understand the relationship between the variables so we can preprocess and model the data better. Also it helps us to identify if there are any data quirks. We remove the data quirks before feeding the data to an algorithm. We can also check for multicollinearity by it. We'll study about multicollinearity in the next chapter.

Note- There is a common misconception that new learners believe that the positive correlation is greater than the negative correlation. For example, they believe a correlation coefficient of 0.98 is greater than -0.98. It is totally incorrect. Sign only tells us about the direction of correlation. That is, positive correlation means values increase together while in the negative correlation, one value decreases as the other value increases. So, two variables with a correlation coefficient of 0.98 are the same strongly correlated as those of -0.98 differing in only the direction of correlation.


Please login to comment

2 Comments

What is data quirk?

  Upvote    Share

A data quirk refers to an unusual or unexpected data point in a dataset that may have a significant impact on the analysis or interpretation of the data. Data quirks can take many forms, such as outliers, missing data, non-normal distributions, or measurement errors, among others. In some cases, they may be simply noise or random variation that can be safely ignored, while in other cases, they may reveal important patterns or relationships in the data that need to be explored further.

 1  Upvote    Share