Hi
In the videos, the teacher first shffle the data and then selects the training and test data
My question is whether in
train_test_spit
shuffle data then selects or selects based on data order
Please calrify my doubt on this. What is the difference between the oob_score and the accuracy score that we saw in frame 1:47 min ? Correct me if I am wrong:
1) oob_score is the score cacluated by the predictors on non-sampled data by making a prediction. This happens during fiting of the model.
2)accuarcy is the score compared by expected output against model output.
If I am all okay with understanding, how these 2 things are different, why score of accuracy is more than oob_score ? oob_score is calculated in similar fashion like accuracy score, if we think about non-sampled data as test data.
oob_score is the score given to those rows which are not the part of the bootstraps of those decision trees. Basically the OOB score is calculated using only a subset of DTs not containing the OOB sample in their bootstrap training dataset. The accuracy of training data is based on the voting of all the decision trees in then ensemble. It might be so that there is a case of overfitting if the acuracy of train data is more than the accuracy of test data.
I found some question regarding Random forest, Can someone explain this below case problem.
You were assigned to a project where you built a random forest model with 10000 trees. You were in cloud-nine after getting a training error of 0.00. But the validation error is 46.89. What went wrong? Does that mean that you trained your model wrong ?
I myself came to conclusion that the trees aren't diverse in Random forest. If someone has any idea about this can explain it.
This might be due to: (1) validation data may be significantly different from training data (2) your model might have overfit the train data (3) training data might have been less, thus the model was not able to generalize. We can try to get more data, tune the hyper-parameters, shuffle the data well before splitting to make the model perform better.
In classification using ensemble learning, we train a bunch of classifiers and decide the class of an input sample based on the class predicted by the majority of the classifiers. For example, a random forest is an ensemble classifier because it consists of a bunch of decision trees. If we have 100 trees and if 80 of them predict that the class of an input data sample is class A, then we output that the class of that data sample is class A, since majority of the classifiers in the ensemble voted for class A. Hope this helps.
The entire out-of-bag evaluation is explain in the video. Would suggest you to watch it again and check the slides. Also, you can check the below link for more explanation:
We train a model with X_train and y_train. However, we also need to evaluate the model and check if it is underfitting/overfitting, measure it's accuracy. For that, we use X_test and y_test.
This is because Random Forest is itself providing the result. However, for Voting Classifier, it is collecting the results from other models and puts it to vote (hard/soft voting) and based on that deduces the final result.
In general, first we fit the training data in a model, and then we predict the test data using that model. In this problem we need to test a number of models on this data. So instead of trying them one at a time, we have created this function/pipeline which does the fitting and predicting in as many models as you pass to it.
In the slide, it's mentioned that the bagging is betting than pasting but on comparing the accuracy score of both methods following results where found:
As seen in slide# 47, we have mentioned that bagging "often" results in better models. That does not mean it will "always" yield better results. There are no one size fits all solution when it comes to Machine Learning or Deep Learning.
Are you referring to the bias-variance tradeoff? Here, bias is the difference between the average prediction of the model and the correct value which we are trying to predict. Model with high bias pays very little attention to the training data and oversimplifies the model. It always leads to high error on training and test data.
1.the classifier and regressor names of the classes are slightly confusing
Are all the classes discussed here not LogisticRegression?
it appears there is a joint application of classifier and regression too, like decision tree and randomForest, in that context are they doing Logistic regression?
2. The ensemble techniques can be used for LinearRegression too?
max_samples < 1 and max_samples =1 is confusing. We saw max_samples value 300 in bagging classifier exmple. But here the tutor is explaining that less than 1 means not all features of instances for value of max_faetures or max_samples.
So, is there any difference between max_samples =1.0 and max_samples =1 ? Is it like ,first one going to take all samples, means it is a fracation value and second one is exact number if not in decimal, so it will take just 1 instance.
In below code, max_samples =300, which means out of 375 traiinig instances, classifier will pick 300 samples. Similar to bag 1 or bag 2 like we saw in slide number 36 and 38 for bagging and pasting, we will have sample size of randomly selected 300 data out of 375 training dataset for each estimator. Means 500 bags kind of structure with random 300 samples right ?
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
And as m grows the ratio of instances which are sampled is 63%. So, can we think like if 1lakh are samples of training instances then 37k will never be picked for sampling and we can use that for testing. These 37k will vary from estimator to estimator, means estimator 1 will have 37k never sampled data as estimator 2 but not necessarily identical because 63k are picked in random fashion from 1 lakh training data.
Suppose there are 4 classifiers and 2 of them are classifying as 1 and the other 2 are classifying as 2.How will the voting classifier will work in that case?
Hard Voting: In hard voting, the predicted output class is a class with the highest majority of votes i.e the class which had the highest probability of being predicted by each of the classifiers. Suppose three classifiers predicted the output class(A, A, B), so here the majority predicted A as output. Hence A will be the final prediction.
Soft Voting: In soft voting, the output class is the prediction based on the average of probability given to that class. Suppose given some input to three models, the prediction probability for class A = (0.30, 0.47, 0.53) and B = (0.20, 0.32, 0.40). So the average for class A is 0.4333 and B is 0.3067, the winner is clearly class A because it had the highest probability averaged by each classifier.
Slide 58 - The statement "Here, the oob evaluation estimates that the second training instance has a 60.6% probability of belonging to the positive class and 39.4% of belonging to the positive class." seems to be wrong somewhere, both predicton % is for +ve class ? How come ? one should be for -ve class right ? Which one is that and why ? Please validate
There are times when normalizing/scaling is good, sometimes it's not. You can read more about when to use feature scaling and when not to use it in the following link: https://stats.stackexchange... Thanks.
We can use any classifier in the bagging classifier as bagging is concerned with how the classifier performs on the different samples of the dataset, and not on what classifier you are using.
If you are talking about the BaggingClassifier, then max_feature is the number of features to draw from X to train each base estimator, whereas max_samples is the number of samples to draw from X to train each base estimator. You can find more about them in the link below: https://scikit-learn.org/st... Thanks.
Please explain why we used logistic regression in voting classifier example. Logistic regression is for regression problems and the example was for classification problem.
AdaBoost is short for Adaptive Boosting. Basically, Ada Boosting was the first really successful boosting algorithm developed for binary classification. Also, it is the best starting point for understanding boosting. Moreover, modern boosting methods build on AdaBoost, most notably stochastic gradient boosting machines.
Generally, AdaBoost is used with short decision trees. Further, the first tree is created, the performance of the tree on each training instance is used. Also, we use it to weight how much attention the next tree. Thus, it is created should pay attention to each training instance. Hence, training data that is hard to predict is given more weight. Although, whereas easy to predict instances are given less weight.
Please login to comment
56 Comments
Hi
In the videos, the teacher first shffle the data and then selects the training and test data
My question is whether in
train_test_spit
shuffle data then selects or selects based on data order
thank you
Upvote ShareHi,
train_test_split shuffles data. Moreover, you can control shuffling by the parameters shuffle and random_state of the function.
1 Upvote ShareThis comment has been removed.
Hi Team,
Please calrify my doubt on this. What is the difference between the oob_score and the accuracy score that we saw in frame 1:47 min ? Correct me if I am wrong:
1) oob_score is the score cacluated by the predictors on non-sampled data by making a prediction. This happens during fiting of the model.
2)accuarcy is the score compared by expected output against model output.
If I am all okay with understanding, how these 2 things are different, why score of accuracy is more than oob_score ? oob_score is calculated in similar fashion like accuracy score, if we think about non-sampled data as test data.
Regards,
Birendra Singh
Upvote ShareHi,
oob_score is the score given to those rows which are not the part of the bootstraps of those decision trees. Basically the OOB score is calculated using only a subset of DTs not containing the OOB sample in their bootstrap training dataset. The accuracy of training data is based on the voting of all the decision trees in then ensemble. It might be so that there is a case of overfitting if the acuracy of train data is more than the accuracy of test data.
Thanks.
Upvote ShareThis comment has been removed.
I found some question regarding Random forest, Can someone explain this below case problem.
You were assigned to a project where you built a random forest model with 10000 trees. You were in cloud-nine after getting a training error of 0.00. But the validation error is 46.89. What went wrong? Does that mean that you trained your model wrong ?
I myself came to conclusion that the trees aren't diverse in Random forest. If someone has any idea about this can explain it.
-Thanks.
Upvote ShareHi,
This might be due to: (1) validation data may be significantly different from training data (2) your model might have overfit the train data (3) training data might have been less, thus the model was not able to generalize. We can try to get more data, tune the hyper-parameters, shuffle the data well before splitting to make the model perform better.
Thanks.
Upvote ShareWhat does it mean "Predict the class that get most votes" . Here what class we are talking about? and how it is getting votes?
Upvote ShareHi,
In classification using ensemble learning, we train a bunch of classifiers and decide the class of an input sample based on the class predicted by the majority of the classifiers. For example, a random forest is an ensemble classifier because it consists of a bunch of decision trees. If we have 100 trees and if 80 of them predict that the class of an input data sample is class A, then we output that the class of that data sample is class A, since majority of the classifiers in the ensemble voted for class A. Hope this helps.
Thanks.
Upvote ShareI did not understand.
at 1:45:59
request an automatic obb evaluation what does it mean??
what it has to do with bagging??
what will be the result??
Ple3ase give the answers of all 3 questions
Upvote ShareHi,
The entire out-of-bag evaluation is explain in the video. Would suggest you to watch it again and check the slides. Also, you can check the below link for more explanation:
https://stackoverflow.com/questions/18541923/what-is-out-of-bag-error-in-random-forests
Thanks.
Upvote ShareIn bagging classifier you aree fitting X_train and y_train but in below line
you are predicting X_test.
y_pred= bag_clf.predict(X_test)
Why,I did not understand>??
Please tell me, why you are predicting this??
Upvote ShareHi,
We train a model with X_train and y_train. However, we also need to evaluate the model and check if it is underfitting/overfitting, measure it's accuracy. For that, we use X_test and y_test.
Thanks.
Upvote ShareRandom forest and Voting classifier both are ensemble methods then why we are results are different?
I mean why results are different for two?
Upvote ShareHi,
This is because Random Forest is itself providing the result. However, for Voting Classifier, it is collecting the results from other models and puts it to vote (hard/soft voting) and based on that deduces the final result.
Thanks.
1 Upvote ShareAt 53:52
Where are we fitting the X_train and y_train ??
What have you done in accuracy_score ??
which are you fitting X_test or X_train??
which output we are getting in log,clf,rnd and acc_score X_train or X_test??
Hi,
In general, first we fit the training data in a model, and then we predict the test data using that model. In this problem we need to test a number of models on this data. So instead of trying them one at a time, we have created this function/pipeline which does the fitting and predicting in as many models as you pass to it.
Thanks.
Upvote Shareat 34:32
i did not understand the prob. program.
please expalin.
Hi,
Here we are calculating the probability of a biased coin giving heads when tossed.
Thanks.
Upvote ShareDear Cloudx,
In the slide, it's mentioned that the bagging is betting than pasting but on comparing the accuracy score of both methods following results where found:
-Bagging = 0.904
-Pasting = 0.928
Are the above results proper?
Upvote ShareHi,
As seen in slide# 47, we have mentioned that bagging "often" results in better models. That does not mean it will "always" yield better results. There are no one size fits all solution when it comes to Machine Learning or Deep Learning.
Thanks.
Upvote ShareThis comment has been removed.
How can we say ML models are biased? Can you please explain this a bit more?
Upvote ShareHi,
Are you referring to the bias-variance tradeoff? Here, bias is the difference between the average prediction of the model and the correct value which we are trying to predict. Model with high bias pays very little attention to the training data and oversimplifies the model. It always leads to high error on training and test data.
Thanks.
Upvote ShareThe way we DecisionTreeClassifier to BaggingClassifier, can we do with other classes combination too? like RandomForest?
Is there any BoostingClassifier?
Upvote ShareHi,
Good question!
Please go through the below link for a detailed discussion on the same:
https://machinelearningmastery.com/bagging-and-random-forest-ensemble-algorithms-for-machine-learning/
Thanks.
Upvote ShareSir,
1.the classifier and regressor names of the classes are slightly confusing
Are all the classes discussed here not LogisticRegression?
it appears there is a joint application of classifier and regression too, like decision tree and randomForest, in that context are they doing Logistic regression?
2. The ensemble techniques can be used for LinearRegression too?
Upvote ShareHi,
1. Classification and Regression are two different tasks. Please go through the training material to understand the difference.
2. Yes. For classification we use voting, for regression we use averaging.
Thanks.
Upvote ShareHi Team,
I have question the below table:
max_samples < 1 and max_samples =1 is confusing. We saw max_samples value 300 in bagging classifier exmple. But here the tutor is explaining that less than 1 means not all features of instances for value of max_faetures or max_samples.
So, is there any difference between max_samples =1.0 and max_samples =1 ? Is it like ,first one going to take all samples, means it is a fracation value and second one is exact number if not in decimal, so it will take just 1 instance.
Regards,
Birendra Singh
Upvote ShareHi,
Here we are discussing max_features and not max_samples.
Thanks.
Upvote ShareHi Team,
In below code, max_samples =300, which means out of 375 traiinig instances, classifier will pick 300 samples. Similar to bag 1 or bag 2 like we saw in slide number 36 and 38 for bagging and pasting, we will have sample size of randomly selected 300 data out of 375 training dataset for each estimator. Means 500 bags kind of structure with random 300 samples right ?
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
bag_clf = BaggingClassifier(
DecisionTreeClassifier(),
n_estimators=500,
max_samples=300,
bootstrap=False,
n_jobs=-1
)
bag_clf.fit(X_train, y_train)
y_pred = bag_clf.predict(X_test)
accuracy_score(y_pred, y_test)
And as m grows the ratio of instances which are sampled is 63%. So, can we think like if 1lakh are samples of training instances then 37k will never be picked for sampling and we can use that for testing. These 37k will vary from estimator to estimator, means estimator 1 will have 37k never sampled data as estimator 2 but not necessarily identical because 63k are picked in random fashion from 1 lakh training data.
Regards,
Birendra Singh
Hi,
Please find a detail explanation of how max_sample for BaggingClassifier affects the number of samples:
https://stackoverflow.com/questions/38772035/how-does-max-samples-keyword-for-a-bagging-classifier-effect-the-number-of-sam
Thanks.
Upvote ShareSuppose there are 4 classifiers and 2 of them are classifying as 1 and the other 2 are classifying as 2.How will the voting classifier will work in that case?
Upvote ShareHi,
Voting Classifier supports two types of votings.
Hard Voting: In hard voting, the predicted output class is a class with the highest majority of votes i.e the class which had the highest probability of being predicted by each of the classifiers. Suppose three classifiers predicted the output class(A, A, B), so here the majority predicted A as output. Hence A will be the final prediction.
Soft Voting: In soft voting, the output class is the prediction based on the average of probability given to that class. Suppose given some input to three models, the prediction probability for class A = (0.30, 0.47, 0.53) and B = (0.20, 0.32, 0.40). So the average for class A is 0.4333 and B is 0.3067, the winner is clearly class A because it had the highest probability averaged by each classifier.
Thanks.
Upvote ShareIf we are feeding the same data to a no of classifiers then why soft voting has more accuracy than hard voting?
Upvote ShareHi,
Soft voting can improve on hard voting because it takes into account more information; it uses each classifier's uncertainty in the final decision.
Thanks.
Upvote ShareSlide 58 - The statement "Here, the oob evaluation estimates that the second training instance has a
Upvote Share60.6% probability of belonging to the positive class and 39.4% of belonging to
the positive class." seems to be wrong somewhere, both predicton % is for +ve class ? How come ? one should be for -ve class right ? Which one is that and why ? Please validate
nCr a^n-r b^r
a What is the r in this ? I got the other terms, but not getting what r is
Upvote ShareHi
Upvote ShareShould we always feed scaled data to these ensemble methods for modelling of machine learning models?
Thanks
Hi,
There are times when normalizing/scaling is good, sometimes it's not. You can read more about when to use feature scaling and when not to use it in the following link:
https://stats.stackexchange...
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareHi
Upvote ShareIts not very clear but i think normalization wont hurt if even not needed. AM i right?
Hi,
Yes you are right.
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareOk Thanks
Upvote ShareHi
1 Upvote ShareTo increase accuracy level on unseen data should we change the value of max_samples in bagging classifier?
Hi
Upvote ShareI want to ask the significance of OOB SCORE in case of large dataset(say 80000,9 shape), where we are making bootstrap=False
Thanks
Prachi
Hi
Upvote ShareCan we use any classifier within bagging classifier or we can only use decision tree classifier as mentioned in notebook?
Thanks
Prachi
We can use any classifier in the bagging classifier as bagging is concerned with how the classifier performs on the different samples of the dataset, and not on what classifier you are using.
1 Upvote ShareHi
1 Upvote ShareShould we use stratefied sampling before modelling using ensemble techniques for better results or simply do train_|test_split?
Thanks
In ensemble max_feature and max_sample value can only either be 1 or 0??
Upvote ShareHi,
If you are talking about the BaggingClassifier, then max_feature is the number of features to draw from X to train each base estimator, whereas max_samples is the number of samples to draw from X to train each base estimator. You can find more about them in the link below:
https://scikit-learn.org/st...
Thanks.
-- Rajtilak Bhattacharjee
Upvote SharePlease explain why we used logistic regression in voting classifier example.
Upvote ShareLogistic regression is for regression problems and the example was for classification problem.
Hi,
Logistic regression is a supervised learning classification algorithm. Linear regression is used for regression problems.
Thanks.
-- Rajtilak Bhattacharjee
Upvote ShareThank you.
Upvote ShareI have one more question:
Can we use gradient descent, SGD etc. as one of the predictors in AdaBoost classifier?
Hi,
AdaBoost is short for Adaptive Boosting. Basically, Ada Boosting was the first really successful boosting algorithm developed for binary classification. Also, it is the best starting point for understanding boosting. Moreover, modern boosting methods build on AdaBoost,
most notably stochastic gradient boosting machines.
Generally, AdaBoost is used with short decision trees. Further, the first tree is created, the performance of the tree on each training instance is used. Also, we use it to weight how much attention the next tree. Thus, it is created should pay attention to each training instance. Hence, training data that is hard to predict is given more weight. Although, whereas easy to predict instances are given less weight.
You can learn more about AdaBoostClassifier here:
https://scikit-learn.org/st...
Thanks.
-- Rajtilak Bhattacharjee
Upvote Share