Login using Social Account
     Continue with GoogleLogin using your credentials
When you were exploring the dataset, you must have noticed that some of the features had missing data.
We will revert to a clean training set that we got after we used StratifiedShuffleSplit
and drop the median_house_value
since it is the label that we will predict
housing = strat_train_set.<<your code goes here>>("median_house_value", axis=1)
Now we will store the labels in housing_labels
variable
<<your code goes here>> = strat_train_set["median_house_value"].copy()
Now we will impute the missing values using the SimpleImputer
class. First, import the SimpleImputer
class from sklearn
from sklearn.impute import <<your code goes here>>
Now, for the missing values we will consider the median value for that feature. We are not considering mean since median is a better measure of central tendency as it takes into account the outliers. We will set the strategy
parameter to "median"
in the SimpleImputer
class
imputer = SimpleImputer(<<your code goes here>>="median")
Now let's drop the categorical column ocean_proximity
because median can only be calculated on numerical attributes
housing_num = housing.drop("ocean_proximity", axis=1)
We will use fit
on the housing_num
dataset
imputer.<<your code goes here>>(housing_num)
Now we will use transform
the training set
X = imputer.<<your code goes here>>(housing_num)
housing_tr = pd.DataFrame(X, columns=housing_num.columns,
index=housing.index)
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Loading comments...