Login using Social Account
     Continue with GoogleLogin using your credentials
On doing some research, we find out that the median_income is an important attribute to predict median_house_value. So it can be a characteristic to create strata. Now, we may want to ensure that the test set is representative of the various categories of income in the whole dataset. Since the median_income is a continuous numerical attribute, we first need to convert it into a categorical attribute.
We can use the cut()
function from the pandas
library, for converting median_income to a categorical attribute. Its syntax is-
pd.cut(x)
where x
is the input array to be binned or categorized.
cut()
has 2 important parameters-
bins- The criteria for the bin by. So if we provide bins as [1,4,7,10], then it will put all the values of x
ranging from 1 to 4 in category 1, 4 to 7 in category 2, and 7 to 10 in category 3.
labels- Specify the labels for the returned bins. So for the above, if we provide labels as [1,2,3] then all the instances belonging to category 1 will be valued 1
, category 2 will be named 2
and category 3 will be named 3
. On the other hand, if we provide labels as ['one', 'two', 'three'], then all instances belonging to category 1 will be valued one
, category 2 will be named two
and category 3 will be named three
. We can name the values anything. Remember, it must be the same length as the resulting bins.
Refer to pd.cut() documentation for more details about the method.
Categorize the median_income attribute of our dataset in 5 categories and store it in a variable named income_cat
such as-
0 to 1.5
are valued at 1
.1.5 to 3
are valued at 2
.3 to 4.5
are valued at 3
.4.5 to 6
are valued at 4
.6 to 16
are valued at 5
.Display the first five rows of the income_cat
using the head()
method.
Note- We took the last value as 16 because the max value of median_income is 15.000100 and making the last bin value 16 will cover all the instances for sure. We can use any number larger than 15.000100 and the result will be always the same.
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...