Missing values

The first thing we should do while preprocessing the data is to handle the attributes with missing values because most machine learning algorithms cannot handle missing values.

We have already noticed earlier that one of the attributes i.e., total_bedrooms contains missing values, so let's fix this. We have three options-

Drop the rows- We can try dropping the rows (remove those blocks which don't contain information about total_bedrooms). We generally drop the rows when we have a large dataset and only some of the rows contain missing values. So, dropping those rows does not have much of an effect on the dataset. Also, when a row has most of its column values missing, it is advisable to drop that row.
Drop the columns- We can try dropping the whole attribute containing missing values. We generally drop the column when an attribute contains missing values in the majority.
Set the missing value to some value- When neither of the above scenarios is satisfied, we impute the missing values with some value. The value can be an arbitrary value such as 0, or the attribute's mean, median, mode, etc. This is the most widely used technique to handle missing values as we try our best to prevent information loss which surely happens by dropping the rows or columns.

Previous Index Next

End-to-End ML Project- Beginner friendly

Missing values

XP

Loading comments...