Login using Social Account
     Continue with GoogleLogin using your credentials
Now that we have prepared the data, we will train a Decision Tree model on that data and see how it performs. Since this is a regression problem, we will use the DecisionTreeRegressor
class from Scikit-learn.
Import the DecisionTreeRegressor
class from Scikit-learn
from sklearn.tree import <<your code goes here>>
Now let's train the DecisionTreeRegressor
tree_reg = DecisionTreeRegressor(random_state=42)
tree_reg.fit(housing_prepared, housing_labels)
To evaluate the performance of our model, we will import the mean_squared_error
class from Scikit-learn
from sklearn.metrics import <<your code goes here>>
Now let's predict using our model using the predict
method
housing_predictions = tree_reg.<<your code goes here>>(housing_prepared)
Finally, let's evaluate our model
tree_mse = mean_squared_error(housing_labels, housing_predictions)
tree_rmse = np.sqrt(tree_mse)
tree_rmse
If you trained your model correctly, the rmse
would come to 0.0
. This means that our model is most likely overfitting. How to check and resolve this issue? We will come to that in a bit, but before that we will train a Random Forest
model.
Note- If a model performs significantly better on the training data than on the testing data, it can be overfitting on the training data. But we can't be sure that it's always overfitting because this scenario can also arise due to some other problems such as data mismatch on training and test set. It means the test set contains different type of data(having different distribution) which was not there in training data. It can also be the case that you are observing this because of the stochastic nature of the algorithm. So we have to check whether the model is really overfitting or if it is suffering from some other problem.
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Loading comments...