Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left
Apply NowLogin using Social Account
     Continue with GoogleLogin using your credentials
So far we have only dealt with numerical attributes, but now let’s look at text attributes. In this dataset, there is just one: the ocean_proximity
attribute. A Machine Learning model does not understand categorical values, so we will turn this into a numerical value using onehot encoding
.
Onehot encoding
creates one binary attribute per category: one attribute equal to 1
when the category
is <1H OCEAN
(and 0
otherwise), another attribute equal to 1
when the category is INLAND
(and 0
otherwise), and so on.
Notice that the output is a SciPy
sparse matrix, instead of a NumPy
array. This is very useful when you have categorical attributes with thousands of categories. After onehot encoding
, we get a matrix with thousands of columns, and the matrix is full of 0s except for a single 1 per row. Using up tons of memory mostly to store zeros would be very wasteful, so instead a sparse matrix only stores the location of the nonzero elements.
Let's see how it is done.
First, we will store the categorical feature in a new variable called housing_cat
<<your code goes here>> = housing[["ocean_proximity"]]
Let's see what it looks like using the head
method
housing_cat.<<your code goes here>>(10)
Now let's import OneHotEncoder
from sklearn
from sklearn.preprocessing import <<your code goes here>>
Now we will fit_transform
our categorical data
cat_encoder = OneHotEncoder()
housing_cat_1hot = cat_encoder.<<your code goes here>>(housing_cat)
housing_cat_1hot
Finally, we will convert it to a dense Numpy array using toarray
method
housing_cat_1hot.<<your code goes here>>()
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...