Login using Social Account
     Continue with GoogleLogin using your credentials
When you were exploring the dataset, you must have noticed that some of the features had missing data.
We will revert to a clean training set that we got after we used StratifiedShuffleSplit and drop the median_house_value since it is the label that we will predict
housing = strat_train_set.<<your code goes here>>("median_house_value", axis=1)
Now we will store the labels in housing_labels variable
<<your code goes here>> = strat_train_set["median_house_value"].copy()
Now we will impute the missing values using the SimpleImputer class. First, import the SimpleImputer class from sklearn
from sklearn.impute import <<your code goes here>>
Now, for the missing values we will consider the median value for that feature. We are not considering mean since median is a better measure of central tendency as it takes into account the outliers. We will set the strategy parameter to "median" in the SimpleImputer class
imputer = SimpleImputer(<<your code goes here>>="median")
Now let's drop the categorical column ocean_proximity because median can only be calculated on numerical attributes
housing_num = housing.drop("ocean_proximity", axis=1)
We will use fit on the housing_num dataset
imputer.<<your code goes here>>(housing_num)
Now we will use transform the training set
X = imputer.<<your code goes here>>(housing_num)
housing_tr = pd.DataFrame(X, columns=housing_num.columns,
                      index=housing.index)
 
            Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...