End-to-End ML Project - California Housing

8 / 17

End to End ML Project - Create a correlation matrix

Now, we will create a correlation matrix to see the correlation coefficients between different variables. The correlation coefficient is a statistical measure of the strength of the relationship between the relative movements of two variables.

If 2 variables are positively correlated then if one of the variables increase, the other one increases along with it. If they are negatively correlated then if one of the variables increase, the other one decreases along with it. However, we must note that even if 2 variables are positively/negatively correlated, it does not always mean that one of them is increasing/decreasing because of the other one which is defined by the phrase "correlation does not imply causation".

We will also create 3 new features from the existing features in the dataset.

  • First we will create 3 new features from the existing features as shown below

    housing["rooms_per_household"] = housing["total_rooms"]/housing["households"]
    housing["bedrooms_per_room"] = housing["total_bedrooms"]/housing["total_rooms"]
  • Now let's calculate the correlation coefficient of all the variables using the corr method

    corr_matrix = housing.corr()
  • Now, let's plot the correlation matrix of all the features. First, we will sort the values using the sort_values method, then we will plot a scatter plot using the scatter_matrix method from Pandas

    from pandas.plotting import scatter_matrix
    attributes = ["median_house_value", "median_income", "total_rooms",
    scatter_matrix(housing[attributes], figsize=(12, 8))
  • Finally, let's get more information on the updated dataset with the new added features using the describe method

See Answer

No hints are availble for this assesment

Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...