End to End ML Project - Fill in the missing data

When you were exploring the dataset, you must have noticed that some of the features had missing data.

INSTRUCTIONS

We will revert to a clean training set that we got after we used StratifiedShuffleSplit and drop the median_house_value since it is the label that we will predict
```
housing = strat_train_set.<<your code goes here>>("median_house_value", axis=1)
```

Now we will store the labels in housing_labels variable

<<your code goes here>> = strat_train_set["median_house_value"].copy()

Now we will impute the missing values using the SimpleImputer class. First, import the SimpleImputer class from sklearn
```
from sklearn.impute import <<your code goes here>>
```
Now, for the missing values we will consider the median value for that feature. We are not considering mean since median is a better measure of central tendency as it takes into account the outliers. We will set the strategy parameter to "median" in the SimpleImputer class
```
imputer = SimpleImputer(<<your code goes here>>="median")
```
Now let's drop the categorical column ocean_proximity because median can only be calculated on numerical attributes
```
housing_num = housing.drop("ocean_proximity", axis=1)
```

We will use fit on the housing_num dataset

imputer.<<your code goes here>>(housing_num)

Now we will use transform the training set

X = imputer.<<your code goes here>>(housing_num)
housing_tr = pd.DataFrame(X, columns=housing_num.columns,
                      index=housing.index)

See Answer

Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...