Login using Social Account
     Continue with GoogleLogin using your credentials
So, I got the scores for DecisionTreeRegressor
as-
array([70072.56333434, 64669.76437454, 70664.61751498, 68361.78369658,
70788.86300501, 74769.9362526, 69933.57686858, 69833.39043083,
76381.61262044, 68969.41090616])
Note- You may have got different scores due to the stochastic nature of cross_val_score()
.
The scores represent the RMSE value of the model on the validation set on different runs. As we chose the value of cv
as 10, it contains 10 evaluation scores. The mean and standard deviation of the scores comes as- (70444.55190040627, 3078.3070579465134).
So, this is the mean RMSE value on the validation data set. Now, the DecisionTreeRegressor
doesn't look like a good fit. DecisionTreeRegressor
is overfitting so badly that it performs even worse than the LinearRegression
model as it had a lesser RMSE value than this. (You can try cross validating the LinearRegression
model in the same way as we did for DecisionTreeRegressor
. The mean RMSE of it will be most probably lesser than the DecisionTreeRegressor
)
So, when our Decision Tree model overfits, we use the Random Forest model. Random Forest trains several decision trees on random subsets of the features and averages out all their values while prediction and hence reducing overfitting by a much greater extent.
Refer to RandomForestRegressor documentation for further details about the estimator.
Import RandomForestRegressor
from sklearn.ensemble
.
Create an instance of the estimator with the name forest_reg
.
Fit the model on our training data i.e. (housing_prepared
, housing_labels
).
Predict the output from the model for our training predictors i.e. (housing_prepared
) and store the output in a variable named predictions
.
Calculate the RMSE for our model RandomForestRegressor
between actual values (housing_labels
) and predicted values (predictions
) and store its value in a variable named forest_rmse
.
Use cross_val_score
function and provide forest_reg
as estimator, housing_prepared
and housing_labels
as predictors_data and target_variable, neg_root_mean_squared_error
as the scoring metric and cv as 10
for parameters as we want to perform 10-fold cross-validation. Store the output in a variable named scores
.
The scores will be negative. Pass them through abs()
function to convert them in positives by-
scores = abs(scores)
Note- It may take some time to cross validate the Random Forest model.
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...