64 / 95

# Scatter Matrix

On referring to the correlation matrix, we can check relations existing between different attributes of our dataset. Likewise, we can see that the attribute `population` has a highly positive correlation with attributes `total_rooms`, `total_bedrooms`, and `households`. This is also obvious because where there are more people, they'll require more rooms.

Remember, correlation only tells us about the linear relationship between the variables. It being close to zero doesn't mean that there's no relationship between those variables. It only means that there is no linear relationship between the two variables. Although, there can be a non-linear relationship between them.

We can also visualize correlation with the `scatter_matrix()` function from `pandas.plotting`. Its syntax is as-

``````pd.plotting.scatter_matrix(DataFrame)
``````

where, `DataFrame` is the name of the DataFrame.

It plots every numerical attribute against every other numerical attribute. In the case of an attribute against itself, instead of plotting the scatter plot, `scatter_matrix()` plots the histogram of the attribute.

Refer to scatter_matrix documentation for further details about the method.

As there are 11 features, we would get 11*11 i.e., 121 plots. It would be much more difficult to fit them on a page. So, we will only take the top 4 attributes which are most correlated with our target attribute.

INSTRUCTIONS

Plot the `scatter_matrix` between the top 4 attributes which are most correlated with the attribute `median_house_value`(irrespective of the direction of correlation) and the attribute `median_house_value` itself of the DataFrame `train_copy`. That makes a total of 5 attributes generating 25 plots.

Specify the parameter: `figsize = (12,10)`.

Note:- We can provide some specific attributes by providing attribute names within single or double quotes separated by a comma, like-

``````DataFrame_name[["attribute_name1", "attribute_name2", "attribute_name3",.....]]
``````