Login using Social Account
     Continue with GoogleLogin using your credentials
As we have seen, we have to perform several data transformation steps in the right order. So sklearn
provides a Pipeline
class to create a pipeline to execute all the steps in sequential order. Its syntax is-
Pipeline(steps)
where, steps
is the list of (name, estimator) tuples in the order in which we want to perform transformations. Here, name is the variable name by which we decided to create the object of the estimator and estimator is the estimator class.
An estimator is any object that learns from data; it may be a classification, regression, or clustering algorithm or a transformer that extracts/filters useful features from raw data.
For example, we can chain SimpleImputer
(with its parameter strategy
set as 'mean') with its instance name as imputer
and StandardScaler
with its instance name as scaler
in the order by-
pipe = Pipeline([
('imputer',SimpleImputer(strategy="mean")),
('scaler',StandardScaler())
])
where pipe
is the name of the pipeline instance.
Remember, all except the last estimator must be a transformer(i.e. they must have a fit_transform()
method). The last estimator can be or cannot be a transformer. In the above example, StandardScaler
is a transformer. Also, the name of the estimators can be anything until they don't contain a double underscore in it. So you can't name an estimator as std__scaler
.
Then, we can call the methods fit()
, transform()
or fit_transform()
on the object using the syntax-
pipeline_name.method_name(dataset)
When we call the fit()
method on our pipeline instance, it calls the method fit_transform()
on all estimators, passing the output of one as input to the next in a sequential order until the last estimator. On the last estimator, it calls the fit()
method. But when we use the fit_transform()
method on our pipeline, then on the last estimator too fit_transform()
is called instead of the fit()
method. So, if our last estimator is a transformer, it is advised to use the fit_transform()
method.
Refer to Pipeline documentation for further details about the class.
Pipeline
from sklearn.pipeline
.Create a pipeline instance with the name num_pipeline
with its estimators in the order-
a) SimpleImputer
transformer with its parameter strategy
specified as median. The name of the estimator should be imputer
.
b) Our custom transformer CombinedAttributesAdder
with no parameters. The name of the estimator should be attribs_adder
.
c) StandardScaler
with no parameters. The name of the estimator should be std_scaler
.
Use the method fit_transform()
on the pipeline num_pipeline. Specify the dataset as housing_num
and store the result in a variable named housing_num_tr
.
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Loading comments...