New Dataset

Now let's again compute the correlation matrix for our DataFrame train_copy in the same way we computed before.

On looking into our new correlation matrix, we can see that the new bedrooms_per_room attribute is much more correlated with the median_house_value than the total number of rooms or bedrooms.

Also, rooms_per_household shows a slightly greater correlation with the target variable than the attributes total_rooms and households separately.

We can try other combinations too and select the combinations in our dataset that really make an improvement.

Remember, we don't have to be absolutely thorough while exploring data. Our goal is to get an idea of the data and gain some insights to get the idea of building a prototype. Also, once we get the prototype, we can gain some more insights from the output and come back to this exploration step.

Now, as we have completed the third step to Explore the data, we will start the fourth step of our machine learning pipeline, which is, Prepare the Data for Machine Learning Algorithms

See Answer

Previous Index Next

End-to-End ML Project- Beginner friendly

New Dataset

XP

Please login to comment

Be the first one to comment!