End-to-End ML Project- Beginner friendly

86 / 95

Performing cross Validation

You can use sklearn's cross_val_score() like-

cross_val_score(estimator, predictors_data, target_variable, scoring = None, cv = None)


estimator is our ML model,

scoring is the evaluation metric that we specify (Refer to metrics for seeing a list of available metrics in sklearn for the scoring parameter) and

cv is the value of k in the k-fold.

We use neg_root_mean_squared_error as the scoring metric for our task. It is negative RMSE. Sklearn's cross-validation features uses a utility function instead of a cost function. In the cost function, the cost will be lower for a better model while in a utility function, it should be greater for a better model.

And because of this convention of sklearn to use a utility function while cross validating, we use the scoring function as negative RMSE. It is the opposite of the RMSE. So, negative RMSE is just a negative version of the numbers which we get in RMSE. So, if for one data point, RMSE comes as 3 then negative RMSE will be -3 for that.

Refer to cross_val_score documentation for further details about the method.

  1. Import function cross_val_score from sklearn.model_selection.

  2. Use the cross_val_score function and provide tree_reg as estimator, housing_prepared and housing_labels as predictors_data and target_variable, neg_root_mean_squared_error as the scoring metric and cv as 10 for parameters as we want to perform 10-fold cross-validation. Store the output in a variable named scores.

  3. The scores will be negative. Pass them through abs() function to convert them in positives by-

    scores = abs(scores)

    Note- Scores will be different on different runs due to the stochastic nature of the function.

Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...