Flash Sale: Flat 70% + Addl. 25% Off on all Courses | Use Coupon DS25 in Checkout | Offer Expires In

  Enroll Now

End to End Project - Bikes Assessment - Basic - Divide into training/ test dataset

Now, since we have cleaned the bikesData data set, let us split it into Training and Test data sets into 70:30 ratio using scikit-learn's train_test_split() function.

Also, train_test_split() function uses 'Random Sampling', hence resulting train_set and test_set data sets have to be sorted by dayCount. Random Sampling may not be the best way to split the data, what other types of best Sampling method you can think of?

INSTRUCTIONS
  • Import train_test_split() function from scikit-learn.

  • Please add a new feature(column) dayCount to bikesData data set using below code:

    bikesData['dayCount'] = pd.Series(range(bikesData.shape[0]))/24
    
  • Split the bikesData data set into Train and Test data set (train_set and test_set) in 70:30 ratio using scikit-learn's train_test_split() function.

  • Sort the train_set and test_set values by dayCount by using the below code:

    train_set.sort_values('dayCount', axis= 0, inplace=True) test_set.sort_values('dayCount', axis= 0, inplace=True)

  • Now print the 'number of instances' for train_set and test_set data sets.


No hints are availble for this assesment

Answer is not availble for this assesment


Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...