Big Data with Spark - Flat 33% OFF   Offer ends in

Project - Bike Rental Forecasting

43 / 49

End to End Project - Bikes Assessment - Train models

Training and analyze models

Models to be trained and analyzed:

  1. DecisionTreeRegressor

  2. LinearRegression

  3. RandomForestRegressor

Metrics calculated: neg_mean_absolute_error, neg_mean_squared_error using cross-validation

Task 1: Complete the statement to define forest_reg as a RandomForestRegressor with random_state = 42

Task 2: Store predicted values from the classifier using cross_val_predict. As identified as action tasks Consider 'xformHr', 'temp', 'dayCount' as the training features and 10 folds.

    # Task 1: make changes here
    forest_reg = RandomForestRegressor(max_depth=32, min_samples_split = 128, min_samples_leaf= 10, random_state=42)

    # Task 2: Is everything ok here?
    display_scores(-cross_val_score(forest_reg, train_set[['xformWorkHr','temp','dayCount']], train_set['cntDeTrended'], cv=10, scoring="neg_mean_absolute_error"))
    display_scores(np.sqrt(-cross_val_score(forest_reg, train_set[['xformWorkHr','temp','dayCount']], train_set['cntDeTrended'], cv=10, scoring="neg_mean_squared_error")))
    train_set_freg = train_set.copy()
    train_set_freg['predictedCounts'] = cross_val_predict(forest_reg, train_set[['xformWorkHr','temp','dayCount']], train_set['cntDeTrended'], cv=10)
    train_set_freg['resids'] = train_set_freg['predictedCounts'] - train_set_freg['cntDeTrended']

Features used:

  1. xformWorkHr

  2. temp

  3. dayCount

Note:

  • The feature selection is based on the previous observations while analyzing the dataset. The model may be better with a different feature selection. Can you think of a better feature selection and justify it?