Previous Index Next

End to End ML Project - Create a correlation matrix

Now, we will create a correlation matrix to see the correlation coefficients between different variables. The correlation coefficient is a statistical measure of the strength of the relationship between the relative movements of two variables.

If 2 variables are positively correlated then if one of the variables increase, the other one increases along with it. If they are negatively correlated then if one of the variables increase, the other one decreases along with it. However, we must note that even if 2 variables are positively/negatively correlated, it does not always mean that one of them is increasing/decreasing because of the other one which is defined by the phrase "correlation does not imply causation".

We will also create 3 new features from the existing features in the dataset.

INSTRUCTIONS

First we will create 3 new features from the existing features as shown below

housing["rooms_per_household"] = housing["total_rooms"]/housing["households"]
housing["bedrooms_per_room"] = housing["total_bedrooms"]/housing["total_rooms"]
housing["population_per_household"]=housing["population"]/housing["households"]

Now let's calculate the correlation coefficient of all the variables using the corr method
```
corr_matrix = housing.corr()
```

Now, let's plot the correlation matrix of all the features. First, we will sort the values using the sort_values method, then we will plot a scatter plot using the scatter_matrix method from Pandas

corr_matrix["median_house_value"].sort_values(ascending=False)

from pandas.plotting import scatter_matrix

attributes = ["median_house_value", "median_income", "total_rooms",
              "housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))

Finally, let's get more information on the updated dataset with the new added features using the describe method
```
housing.describe()
```

See Answer

Note - Having trouble with the assessment engine? Follow the steps listed here

End-to-End ML Project - California Housing

End to End ML Project - Create a correlation matrix

XP

Please login to comment

0 Comments