Login using Social Account
     Continue with GoogleLogin using your credentials
Up to now, we had to handle categorical and numerical attributes separately. It would be more convenient if we can create a pipeline that can handle both categorical and numerical attributes by applying the appropriate transformation to each attribute. But the Pipeline
class applies the same transformations to each attribute. So, to solve this problem, we use the ColumnTransformer
class of sklearn.
In ColumnTransformer
, we can specify the list of numerical and categorical attributes in our dataset, and then it applies each transformation to appropriate columns and at last concatenates the output. Its syntax is-
ColumnTransformer(transformers)
where, transformers
is the list of tuples (name, transformer, columns) specifying the transformer objects to be applied to subsets of the data. Here, name is the name of the variable by which we decided to create the object of the transformer, transformer is the transformer class and columns is the list of attributes on which we need to apply the particular transformation.
It follows much similar syntax as the Pipeline
class. For example, we can apply StandardScaler
on numerical attributes with its instance name as scaler
and OneHotEncoder
on categorical attributes with its instance name as cat
by-
pipe = ColumnTransformer([
("scaler", StandardScaler(), num_attributes),
("cat", OneHotEncoder(), ["ocean_proximity"[)
])
where num_attributes
is a list containing names of numerical attributes and ocean_proximity
is the categorical attribute.
We can also specify a pipeline in place of the class using the same syntax. But remember, ColumnTransformer
works only with transformers. So even the last estimator in the pipeline must be a transformer.
Then we can use the instance methods using the same syntax as of the Pipeline
class.
Refer to ColumnTransformer documentation for further details about the class.
Import the class ColumnTransformer
from sklearn.compose
.
Create an instance of ColumnTransformer
with the name full_pipeline
with the transformers -
a) Pipeline num_pipeline
which we created before and specify its name as num
and columns as list(housing_num)
. list(housing_num)
contains names of all the numerical attributes.
b) Class OneHotEncoder
with the name cat
and specify columns as the categorical columns of our dataset.
Use the fit_transform()
method on full_pipeline
and specify the dataset as train_data
. Store the output in a variable named housing_prepared
.
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Loading comments...