- Home
- Assessment

10 / 17

So far we have only dealt with numerical attributes, but now let’s look at text attributes. In this dataset, there is just one: the `ocean_proximity`

attribute. A Machine Learning model does not understand categorical values, so we will turn this into a numerical value using `onehot encoding`

.

`Onehot encoding`

creates one binary attribute per category: one attribute equal to `1`

when the category
is `<1H OCEAN`

(and `0`

otherwise), another attribute equal to `1`

when the category is `INLAND`

(and `0`

otherwise), and so on.

Notice that the output is a `SciPy`

sparse matrix, instead of a `NumPy`

array. This is very useful when you have categorical attributes with thousands of categories. After `onehot encoding`

, we get a matrix with thousands of columns, and the matrix is full of 0s except for a single 1 per row. Using up tons of memory mostly to store zeros would be very wasteful, so instead a sparse matrix only stores the location of the nonzero elements.

Let's see how it is done.

First, we will store the categorical feature in a new variable called

`housing_cat`

`<<your code goes here>> = housing[["ocean_proximity"]]`

Let's see what it looks like using the

`head`

method`housing_cat.<<your code goes here>>(10)`

Now let's import

`OneHotEncoder`

from`sklearn`

`from sklearn.preprocessing import <<your code goes here>>`

Now we will

`fit_transform`

our categorical data`cat_encoder = OneHotEncoder() housing_cat_1hot = cat_encoder.<<your code goes here>>(housing_cat) housing_cat_1hot`

Finally, we will convert it to a dense Numpy array using

`toarray`

method`housing_cat_1hot.<<your code goes here>>()`

XP

Taking you to the next exercise in seconds...

Want to create exercises like this yourself? Click here.

Checking Please wait.

Success

Error

No hints are availble for this assesment

Fetching answer, please wait...

Error

**Note - **Having trouble with the assessment engine? Follow the steps listed
here

## Loading comments...