End-to-End ML Project - California Housing

12 / 17

End to End ML Project - Creating transformation pipelines

As you have seen, there are many data transformation steps that need to be executed in the right order. Fortunately, Scikit-Learn provides the Pipeline class to help with such sequences of transformations.

INSTRUCTIONS
  • Copy paste the code below as is. Here we are using a pipeline to process the data by first imputing it using SimpleImputer, then using the custom transformer created earlier to merge the columns, and finally, use the StandardScaler class to scale the entire training data

    col_names = "total_rooms", "total_bedrooms", "population", "households"
    rooms_ix, bedrooms_ix, population_ix, households_ix = [
        housing.columns.get_loc(c) for c in col_names]
    
    housing_extra_attribs = pd.DataFrame(
        housing_extra_attribs,
        columns=list(housing.columns)+["rooms_per_household", "population_per_household"],
        index=housing.index)
    housing_extra_attribs.head()
    
    from sklearn.pipeline import Pipeline
    from sklearn.preprocessing import StandardScaler
    
    num_pipeline = Pipeline([
            ('imputer', SimpleImputer(strategy="median")),
            ('attribs_adder', CombinedAttributesAdder()),
            ('std_scaler', StandardScaler()),
        ])
    
    housing_num_tr = num_pipeline.fit_transform(housing_num)
    
    from sklearn.compose import ColumnTransformer
    
    num_attribs = list(housing_num)
    cat_attribs = ["ocean_proximity"]
    
    full_pipeline = ColumnTransformer([
            ("num", num_pipeline, num_attribs),
            ("cat", OneHotEncoder(), cat_attribs),
        ])
    
  • Finally, we will fit_transform the entire training data

    housing_prepared = full_pipeline.<<your code goes here>>(housing)
    
See Answer

No hints are availble for this assesment


Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...