- Home
- Assessment

47 / 95

On doing some research, we find out that the *median_income* is an important attribute to predict *median_house_value*. So it can be a characteristic to create strata. Now, we may want to ensure that the test set is representative of the various categories of income in the whole dataset. Since the *median_income* is a continuous numerical attribute, we first need to convert it into a categorical attribute.

We can use the `cut()`

function from the `pandas`

library, for converting *median_income* to a categorical attribute. Its syntax is-

```
pd.cut(x)
```

*where* `x`

is the input array to be binned or categorized.

`cut()`

has 2 important parameters-

**bins**- The criteria for the bin by. So if we provide**bins**as [1,4,7,10], then it will put all the values of`x`

ranging from 1 to 4 in*category 1*, 4 to 7 in*category 2*, and 7 to 10 in*category 3*.**labels-**Specify the labels for the returned bins. So for the above, if we provide**labels**as [1,2,3] then all the instances belonging to category 1 will be valued`1`

, category 2 will be named`2`

and category 3 will be named`3`

. On the other hand, if we provide labels as ['one', 'two', 'three'], then all instances belonging to category 1 will be valued`one`

, category 2 will be named`two`

and category 3 will be named`three`

. We can name the values anything. Remember, it must be the same length as the resulting bins.

Refer to pd.cut() documentation for more details about the method.

Categorize the *median_income* attribute of our dataset in 5 categories and store it in a variable named `income_cat`

such as-

- All values from
`0 to 1.5`

are valued at`1`

. - All values from
`1.5 to 3`

are valued at`2`

. - All values from
`3 to 4.5`

are valued at`3`

. - All values from
`4.5 to 6`

are valued at`4`

. - All values from
`6 to 16`

are valued at`5`

.

Display the first five rows of the `income_cat`

using the `head()`

method.

**Note-** We took the last value as 16 because the max value of *median_income* is 15.000100 and making the last bin value 16 will cover all the instances for sure. We can use any number larger than 15.000100 and the result will be always the same.

XP

Taking you to the next exercise in seconds...

Want to create exercises like this yourself? Click here.

Checking Please wait.

Success

Error

Fetching hint, please wait...

Error

Fetching answer, please wait...

Error

**Note - **Having trouble with the assessment engine? Follow the steps listed
here

## Loading comments...