Login using Social Account
     Continue with GoogleLogin using your credentials
After having analyzed the dataset, we shall divide the entire dataset into training and test set using train_test_split in the ratio 70:30 It uses random sorting and hence the resulting train_set and test_set is sorted by daycount.
Task: Correct the train_test_split function to split the test set in the ratio 70:30.
Hint
from sklearn.model_selection import train_test_split
train_set, test_set = train_test_split(bikesData, test_size=0.3, random_state=42)
train_set.sort_values('dayCount', axis= 0, inplace=True)
test_set.sort_values('dayCount', axis= 0, inplace=True)
print(len(train_set), "train +", len(test_set), "test")
Notes:
This is counter-intuitive to what we understand and can introduce the problem of snooping as discussed.
The division is done using train_test_split() function provided in sklearn module. This may not be the best way to divide it. Can you think of a better way of sampling the dataset?
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Answer is not availble for this assesment
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...