End-to-End ML Project- Beginner friendly

87 / 95

Evaluating RandomForestRegressor

So, I got the scores for DecisionTreeRegressor as-

array([70072.56333434, 64669.76437454, 70664.61751498, 68361.78369658,
   70788.86300501, 74769.9362526, 69933.57686858, 69833.39043083,
   76381.61262044, 68969.41090616])

Note- You may have got different scores due to the stochastic nature of cross_val_score().

The scores represent the RMSE value of the model on the validation set on different runs. As we chose the value of cv as 10, it contains 10 evaluation scores. The mean and standard deviation of the scores comes as- (70444.55190040627, 3078.3070579465134).

So, this is the mean RMSE value on the validation data set. Now, the DecisionTreeRegressor doesn't look like a good fit. DecisionTreeRegressor is overfitting so badly that it performs even worse than the LinearRegression model as it had a lesser RMSE value than this. (You can try cross validating the LinearRegression model in the same way as we did for DecisionTreeRegressor. The mean RMSE of it will be most probably lesser than the DecisionTreeRegressor)

So, when our Decision Tree model overfits, we use the Random Forest model. Random Forest trains several decision trees on random subsets of the features and averages out all their values while prediction and hence reducing overfitting by a much greater extent.

Refer to RandomForestRegressor documentation for further details about the estimator.

  1. Import RandomForestRegressor from sklearn.ensemble.

  2. Create an instance of the estimator with the name forest_reg.

  3. Fit the model on our training data i.e. (housing_prepared, housing_labels).

  4. Predict the output from the model for our training predictors i.e. (housing_prepared) and store the output in a variable named predictions.

  5. Calculate the RMSE for our model RandomForestRegressor between actual values (housing_labels) and predicted values (predictions) and store its value in a variable named forest_rmse.

  6. Use cross_val_score function and provide forest_reg as estimator, housing_prepared and housing_labels as predictors_data and target_variable, neg_root_mean_squared_error as the scoring metric and cv as 10 for parameters as we want to perform 10-fold cross-validation. Store the output in a variable named scores.

  7. The scores will be negative. Pass them through abs() function to convert them in positives by-

    scores = abs(scores)

    Note- It may take some time to cross validate the Random Forest model.

Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...