In this step, we will split the dataset into train and test sets. We will be using the StratifiedShuffleSplit
method from the sklearn
library which is a cross-validator that provides train/test indices to split data in train/test sets.
Import StratifiedShuffleSplit
from sklearn
from sklearn.model_selection import <<your code goes here>>
Now let's divide the dataset in a 80-20 split, for this you need to set the test_size
as 0.2
split = StratifiedShuffleSplit(n_splits=1, test_size=<<your code goes here>>, random_state=42)
for train_index, test_index in split.split(housing, housing["income_cat"]):
strat_train_set = housing.loc[train_index]
strat_test_set = housing.loc[test_index]
Finally, we will drop the income_cat
column from both the train and test set since it is the attribute that our model will predict. For this we will use the drop
method
for set_ in (strat_train_set, strat_test_set):
set_.<<your code goes here>>("income_cat", axis=1, inplace=True)
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Note - Having trouble with the assessment engine? Follow the steps listed here
Please login to comment
0 Comments
There are 27 new comments.