Login using Social Account
     Continue with GoogleLogin using your credentials
Now, since we have cleaned the bikesData
data set, let us split it into Training
and Test
data sets into 70:30 ratio using scikit-learn's train_test_split()
function.
Also, train_test_split()
function uses 'Random Sampling', hence resulting train_set
and test_set
data sets have to be sorted by dayCount
.
Random Sampling may not be the best way to split the data, what other types of best Sampling method you can think of?
We will also define an utility function named display_scores
. This function is used to calculate the basics stats of observed scores from cross-validation of models. Please copy this function in your code, we will be using it often in this project.
Set np random seed to 42 using code below to ensure the results of the exercise are repeatable.
np.random.seed(42)
Import train_test_split
function from scikit-learn's model_selection
Please add a new feature(column) dayCount
to bikesData
data set using below code:
bikesData['dayCount'] = pd.Series(range(bikesData.shape[0]))/24
Split the bikesData
data set into Training set train_set
and Test set test_set
in 70:30 ratio using scikit-learn's train_test_split()
function.
Sort the train_set
and test_set
values by dayCount
by using the below code:
train_set.sort_values('dayCount', axis= 0, inplace=True)
test_set.sort_values('dayCount', axis= 0, inplace=True)
Now print the 'number of instances' for train_set
and test_set
data sets.
Finally, create the function display_scores
as shown below:
def display_scores(scores):
print("Scores:", scores)
print("Mean:", scores.mean())
print("Standard deviation:", scores.std())
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Answer is not availble for this assesment
Loading comments...