End-to-End ML Project- Beginner friendly

You are currently auditing this course.
89 / 94

GridSearchCV

GridSearchCV takes hyperparameters which we want to experiment with different values as input and evaluates all the possible combinations of the hyperparameters using cross-validation.

Suppose there is a model which has three hyperparameters a, b, and c. We want to try out our model on the hyperparameters values as- a = [8,16,32,64] b = [True, False] c = [1,2,3] and we specify cv to 5, then

GridSearchCV will try out 4 * 2 * 3 = 24 combinations. Some of the combinations will be - {a : 8, b: True, c:1}, {a : 16, b: True, c:1}, {a : 32, b: True, c:1}, {a : 8, b: True, c:2} and {a : 8, b: False, c:1}. And, as we have specified cv as 5, the total rounds of training will be 24 * 5 = 120. It means, our model will be trained 120 times. It may take a lot of time for training and this also limits us to try out a large number of combinations. Hence, we should do it carefully.

The syntax of GridSearchCV is-

GridSearchCV(estimator, param_grid, cv = None, scoring = None)

where

estimator is our model instance,

param_grid is a dictionary that contains hyperparameter names as keys and their list of values which we want to try out as values of the dictionary,

cv is the cross-validation parameter (k in k-fold) and

scoring is our evaluation metric.

Then we can make it work by using the fit() method.

Refer to GridSearchCV documentation for further details about the method.

INSTRUCTIONS

Let's find out the best combination of hyperparameters for our RandomForestRegressor.

  1. Import sklearn.model_selection.GridSearchCV.

  2. Create a new instance of RandomForestRegressor with the name reg_forest. We create a new instance to start the training from a new phase as the old one forest_reg is already trained.

  3. Create a python dictionary with the name param_grid and specify the key-value pairs as- {'n_estimators': [3, 10, 30], 'max_features': [2, 4, 6, 8]}

  4. Create an instance of GridSearchCV with the name grid_search. Specify estimator as reg_forest. param_grid as our dictionary param_grid, cv as 5 and scoring as neg_root_mean_squared_error.

  5. Fit grid_search on our training dataset i.e. (housing_prepared, housing_labels).

  6. Use the best_params_ attribute on grid_search to find out the best combination and store it in a variable named best_param. The syntax of displaying attribute value of an object is-

    object_name.attribute_name
    

Loading comments...