End-to-End ML Project - California Housing

6 / 17

End to End ML Project - Split the dataset

In this step, we will split the dataset into train and test sets. We will be using the StratifiedShuffleSplit method from the sklearn library which is a cross-validator that provides train/test indices to split data in train/test sets.

  • Import StratifiedShuffleSplit from sklearn

    from sklearn.model_selection import <<your code goes here>>
  • Now let's divide the dataset in a 80-20 split, for this you need to set the test_size as 0.2

    split = StratifiedShuffleSplit(n_splits=1, test_size=<<your code goes here>>, random_state=42)
    for train_index, test_index in split.split(housing, housing["income_cat"]):
        strat_train_set = housing.loc[train_index]
        strat_test_set = housing.loc[test_index]
  • Finally, we will drop the income_cat column from both the train and test set since it is the attribute that our model will predict. For this we will use the drop method

    for set_ in (strat_train_set, strat_test_set):
        set_.<<your code goes here>>("income_cat", axis=1, inplace=True)
See Answer

No hints are availble for this assesment

Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...