Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left
Apply NowLogin using Social Account
     Continue with GoogleLogin using your credentials
In this step, we will split the dataset into train and test sets. We will be using the StratifiedShuffleSplit
method from the sklearn
library which is a cross-validator that provides train/test indices to split data in train/test sets.
Import StratifiedShuffleSplit
from sklearn
from sklearn.model_selection import <<your code goes here>>
Now let's divide the dataset in a 80-20 split, for this you need to set the test_size
as 0.2
split = StratifiedShuffleSplit(n_splits=1, test_size=<<your code goes here>>, random_state=42)
for train_index, test_index in split.split(housing, housing["income_cat"]):
strat_train_set = housing.loc[train_index]
strat_test_set = housing.loc[test_index]
Finally, we will drop the income_cat
column from both the train and test set since it is the attribute that our model will predict. For this we will use the drop
method
for set_ in (strat_train_set, strat_test_set):
set_.<<your code goes here>>("income_cat", axis=1, inplace=True)
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...