Login using Social Account
     Continue with GoogleLogin using your credentials
As you have seen, there are many data transformation steps that need to be executed in the right order. Fortunately, Scikit-Learn
provides the Pipeline
class to help with such sequences of transformations.
Copy paste the code below as is. Here we are using a pipeline to process the data by first imputing it using SimpleImputer
, then using the custom transformer created earlier to merge the columns, and finally, use the StandardScaler
class to scale the entire training data
col_names = "total_rooms", "total_bedrooms", "population", "households"
rooms_ix, bedrooms_ix, population_ix, households_ix = [
housing.columns.get_loc(c) for c in col_names]
housing_extra_attribs = pd.DataFrame(
housing_extra_attribs,
columns=list(housing.columns)+["rooms_per_household", "population_per_household"],
index=housing.index)
housing_extra_attribs.head()
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
num_pipeline = Pipeline([
('imputer', SimpleImputer(strategy="median")),
('attribs_adder', CombinedAttributesAdder()),
('std_scaler', StandardScaler()),
])
housing_num_tr = num_pipeline.fit_transform(housing_num)
from sklearn.compose import ColumnTransformer
num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"]
full_pipeline = ColumnTransformer([
("num", num_pipeline, num_attribs),
("cat", OneHotEncoder(), cat_attribs),
])
Finally, we will fit_transform
the entire training data
housing_prepared = full_pipeline.<<your code goes here>>(housing)
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Loading comments...