Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left
Apply NowLogin using Social Account
     Continue with GoogleLogin using your credentials
On referring to the correlation matrix, we can check relations existing between different attributes of our dataset. Likewise, we can see that the attribute population
has a highly positive correlation with attributes total_rooms
, total_bedrooms
, and households
. This is also obvious because where there are more people, they'll require more rooms.
Remember, correlation only tells us about the linear relationship between the variables. It being close to zero doesn't mean that there's no relationship between those variables. It only means that there is no linear relationship between the two variables. Although, there can be a non-linear relationship between them.
We can also visualize correlation with the scatter_matrix()
function from pandas.plotting
. Its syntax is as-
pd.plotting.scatter_matrix(DataFrame)
where, DataFrame
is the name of the DataFrame.
It plots every numerical attribute against every other numerical attribute. In the case of an attribute against itself, instead of plotting the scatter plot, scatter_matrix()
plots the histogram of the attribute.
Refer to scatter_matrix documentation for further details about the method.
As there are 11 features, we would get 11*11 i.e., 121 plots. It would be much more difficult to fit them on a page. So, we will only take the top 4 attributes which are most correlated with our target attribute.
Plot the scatter_matrix
between the top 4 attributes which are most correlated with the attribute median_house_value
(irrespective of the direction of correlation) and the attribute median_house_value
itself of the DataFrame train_copy
. That makes a total of 5 attributes generating 25 plots.
Specify the parameter: figsize = (12,10)
.
Note:- We can provide some specific attributes by providing attribute names within single or double quotes separated by a comma, like-
DataFrame_name[["attribute_name1", "attribute_name2", "attribute_name3",.....]]
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Loading comments...