Ensemble Learning and XGBoost

You are currently auditing this course.
1 / 42

Ensemble Learning Part -1


Slides

Download the slides


No hints are availble for this assesment

Answer is not availble for this assesment

Please login to comment

56 Comments

Hi
In the videos, the teacher first shffle the data and then selects the training and test data
My question is whether in 
train_test_spit
shuffle  data then selects or selects based on data order

thank you

  Upvote    Share

Hi,

train_test_split shuffles data. Moreover, you can control shuffling by the parameters shuffle and random_state of the function.

 1  Upvote    Share

This comment has been removed.

Hi Team,

 

Please calrify my doubt on this. What is the difference between the oob_score and the accuracy score that we saw in frame 1:47 min ? Correct me if I am wrong:

1) oob_score is the score cacluated by the predictors on non-sampled data by making a prediction. This happens during fiting of the model.

2)accuarcy is the score compared by expected output against model output.

If I am all okay with understanding, how these 2 things are different, why score of accuracy is more than oob_score ? oob_score is calculated in similar fashion like accuracy score, if we think about non-sampled data as test data.

Regards,

Birendra Singh

  Upvote    Share

Hi,

oob_score is the score given to those rows which are not the part of the bootstraps of those decision trees. Basically the OOB score is calculated using only a subset of DTs not containing the OOB sample in their bootstrap training dataset. The accuracy of training data is based on the voting of all the decision trees in then ensemble. It might be so that there is a case of overfitting if the acuracy of train data is more than the accuracy of test data.

Thanks.

  Upvote    Share

This comment has been removed.

I found some question regarding Random forest, Can someone explain this below case problem.

You were assigned to a project where you built a random forest model with 10000 trees. You were in cloud-nine after getting a training error of 0.00. But the validation error is 46.89. What went wrong? Does that mean that you trained your model wrong ?

 

I myself came to conclusion that the trees aren't diverse in Random forest. If someone has any idea about this can explain it.

 

-Thanks.

  Upvote    Share

Hi,

This might be due to: (1) validation data may be significantly different from training data (2) your model might have overfit the train data (3) training data might have been less, thus the model was not able to generalize. We can try to get more data, tune the hyper-parameters, shuffle the data well before splitting to make the model perform better.

Thanks.

  Upvote    Share

What does it mean "Predict the class that get most votes" . Here what class we are talking about? and how it is getting votes?

  Upvote    Share

Hi,

In classification using ensemble learning, we train a bunch of classifiers and decide the class of an input sample based on the class predicted by the majority of the classifiers. For example, a random forest is an ensemble classifier because it consists of a bunch of decision trees. If we have 100 trees and if 80 of them predict that the class of an input data sample is class A, then we output that the class of that data sample is class A, since majority of the classifiers in the ensemble voted for class A. Hope this helps.

Thanks.

  Upvote    Share

I did not understand.

at 1:45:59

request an automatic obb evaluation what does it mean??

 

what it has to do with bagging??

 

what will be the result??

 

Ple3ase give the answers of all 3 questions 

  Upvote    Share

Hi,

The entire out-of-bag evaluation is explain in the video. Would suggest you to watch it again and check the slides. Also, you can check the below link for more explanation:

https://stackoverflow.com/questions/18541923/what-is-out-of-bag-error-in-random-forests

Thanks.

  Upvote    Share

In bagging classifier you aree fitting X_train and y_train but in below line

 

 

you are predicting X_test.

 

y_pred= bag_clf.predict(X_test)

 

Why,I did not understand>??

 

Please tell me, why you are predicting this??

  Upvote    Share

Hi,

We train a model with X_train and y_train. However, we also need to evaluate the model and check if it is underfitting/overfitting, measure it's accuracy. For that, we use X_test and y_test.

Thanks.

  Upvote    Share

Random forest and Voting classifier both are ensemble methods then why we are results are different?

 

  Upvote    Share

I mean why results are different for two?

  Upvote    Share

Hi,

This is because Random Forest is itself providing the result. However, for Voting Classifier, it is collecting the results from other models and puts it to vote (hard/soft voting) and based on that deduces the final result.

Thanks.

 1  Upvote    Share

At 53:52 

Where are we  fitting the X_train and y_train ??

 

What have  you  done in accuracy_score ??

 

which are you fitting X_test or X_train??

which output we are getting in  log,clf,rnd and acc_score X_train or X_test??

 

  Upvote    Share

Hi,

In general, first we fit the training data in a model, and then we predict the test data using that model. In this problem we need to test a number of models on this data. So instead of trying them one at a time, we have created this function/pipeline which does the fitting and predicting in as many models as you pass to it.

Thanks.

  Upvote    Share

at 34:32 

 

i did not understand the prob. program.

 

please expalin.

 

  Upvote    Share

Hi,

Here we are calculating the probability of a biased coin giving heads when tossed.

Thanks.

  Upvote    Share

Dear Cloudx,

In the slide, it's mentioned that the bagging is betting than pasting but on comparing the accuracy score of both methods following results where found:

-Bagging =  0.904

-Pasting = 0.928

Are the above results proper?

  Upvote    Share

Hi,

As seen in slide# 47, we have mentioned that bagging "often" results in better models. That does not mean it will "always" yield better results. There are no one size fits all solution when it comes to Machine Learning or Deep Learning.

Thanks.

  Upvote    Share

This comment has been removed.

How can we say ML models are biased? Can you please explain this a bit more?

  Upvote    Share

Hi,

Are you referring to the bias-variance tradeoff? Here, bias is the difference between the average prediction of the model and the correct value which we are trying to predict. Model with high bias pays very little attention to the training data and oversimplifies the model. It always leads to high error on training and test data.

Thanks.

  Upvote    Share

The way we  DecisionTreeClassifier to BaggingClassifier, can we do with other classes combination too? like RandomForest?

Is there any BoostingClassifier?

  Upvote    Share

Hi,

Good question!

Please go through the below link for a detailed discussion on the same:

https://machinelearningmastery.com/bagging-and-random-forest-ensemble-algorithms-for-machine-learning/

Thanks.

  Upvote    Share

Sir,

1.the classifier and regressor names of the classes are slightly confusing

Are all the classes discussed here not LogisticRegression?

it appears there is a joint application of classifier and regression too, like decision tree and randomForest, in that context are they doing Logistic regression?

2. The ensemble techniques can be used for LinearRegression too?

  Upvote    Share

Hi,

1. Classification and Regression are two different tasks. Please go through the training material to understand the difference.

2. Yes. For classification we use voting, for regression we use averaging.

Thanks.

  Upvote    Share

Hi Team,

I have question the below table:

max_samples < 1 and max_samples =1 is confusing. We saw max_samples value 300 in bagging classifier exmple. But here the tutor is explaining that less than 1 means not all features of instances for value of max_faetures or max_samples.

So, is there any difference between max_samples =1.0 and max_samples =1 ? Is it like ,first one going to take all samples, means it is a fracation value and second one is exact number if not in decimal, so it will take just 1 instance.

Regards,

Birendra Singh

  Upvote    Share

Hi,

Here we are discussing max_features and not max_samples.

Thanks.

  Upvote    Share

Hi Team,

In below code, max_samples =300, which means out of 375 traiinig instances, classifier will pick 300 samples. Similar to bag 1 or bag 2 like we saw in slide number 36 and 38 for bagging and pasting, we will have sample size of randomly selected 300 data out of 375 training dataset for each estimator. Means 500 bags kind of structure with random 300 samples right ?

from sklearn.ensemble import BaggingClassifier 
from sklearn.tree import DecisionTreeClassifier

bag_clf = BaggingClassifier( 
    DecisionTreeClassifier(), 
    n_estimators=500, 
    max_samples=300, 
    bootstrap=False, 
    n_jobs=-1
)

bag_clf.fit(X_train, y_train)
y_pred = bag_clf.predict(X_test)
accuracy_score(y_pred, y_test)

And as m grows the ratio of instances which are sampled is 63%. So, can we think like if 1lakh are samples of training instances then 37k will never be picked for sampling and we can use that for testing. These 37k will vary from estimator to estimator, means estimator 1 will have 37k never sampled data as estimator 2 but not necessarily  identical because 63k are picked in random fashion from 1 lakh training data.

 

Regards,

Birendra Singh

 

  Upvote    Share

Hi,

Please find a detail explanation of how max_sample for BaggingClassifier affects the number of samples:

https://stackoverflow.com/questions/38772035/how-does-max-samples-keyword-for-a-bagging-classifier-effect-the-number-of-sam

Thanks.

  Upvote    Share

Suppose there are 4 classifiers and 2 of them are classifying as 1 and the other 2 are classifying as 2.How will the voting classifier will work in that case?

  Upvote    Share

Hi,

Voting Classifier supports two types of votings.

Hard Voting: In hard voting, the predicted output class is a class with the highest majority of votes i.e the class which had the highest probability of being predicted by each of the classifiers. Suppose three classifiers predicted the output class(A, A, B), so here the majority predicted A as output. Hence A will be the final prediction.

Soft Voting: In soft voting, the output class is the prediction based on the average of probability given to that class. Suppose given some input to three models, the prediction probability for class A = (0.30, 0.47, 0.53) and B = (0.20, 0.32, 0.40). So the average for class A is 0.4333 and B is 0.3067, the winner is clearly class A because it had the highest probability averaged by each classifier.

Thanks.

  Upvote    Share

If we are feeding the same data to a no of classifiers then why soft voting has more accuracy than hard voting?

  Upvote    Share

Hi,

Soft voting can improve on hard voting because it takes into account more information; it uses each classifier's uncertainty in the final decision.

Thanks.

  Upvote    Share

Slide 58 - The statement "Here, the oob evaluation estimates that the second training instance has a
60.6% probability of belonging to the positive class and 39.4% of belonging to
the positive class." seems to be wrong somewhere, both predicton % is for +ve class ? How come ? one should be for -ve class right ? Which one is that and why ? Please validate

  Upvote    Share

nCr a^n-r b^r

a What is the r in this ? I got the other terms, but not getting what r is

  Upvote    Share

Hi
Should we always feed scaled data to these ensemble methods for modelling of machine learning models?
Thanks

  Upvote    Share

Hi,

There are times when normalizing/scaling is good, sometimes it's not. You can read more about when to use feature scaling and when not to use it in the following link:
https://stats.stackexchange...
Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Hi
Its not very clear but i think normalization wont hurt if even not needed. AM i right?

  Upvote    Share

Hi,

Yes you are right.

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Ok Thanks

  Upvote    Share

Hi
To increase accuracy level on unseen data should we change the value of max_samples in bagging classifier?

 1  Upvote    Share

Hi
I want to ask the significance of OOB SCORE in case of large dataset(say 80000,9 shape), where we are making bootstrap=False
Thanks
Prachi

  Upvote    Share

Hi
Can we use any classifier within bagging classifier or we can only use decision tree classifier as mentioned in notebook?
Thanks
Prachi

  Upvote    Share

We can use any classifier in the bagging classifier as bagging is concerned with how the classifier performs on the different samples of the dataset, and not on what classifier you are using.

 1  Upvote    Share

Hi
Should we use stratefied sampling before modelling using ensemble techniques for better results or simply do train_|test_split?
Thanks

 1  Upvote    Share

In ensemble max_feature and max_sample value can only either be 1 or 0??

  Upvote    Share

Hi,

If you are talking about the BaggingClassifier, then max_feature is the number of features to draw from X to train each base estimator, whereas max_samples is the number of samples to draw from X to train each base estimator. You can find more about them in the link below:
https://scikit-learn.org/st...
Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Please explain why we used logistic regression in voting classifier example.
Logistic regression is for regression problems and the example was for classification problem.

  Upvote    Share

Hi,

Logistic regression is a supervised learning classification algorithm. Linear regression is used for regression problems.

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

Thank you.
I have one more question:
Can we use gradient descent, SGD etc. as one of the predictors in AdaBoost classifier?

  Upvote    Share

Hi,

AdaBoost is short for Adaptive Boosting. Basically, Ada Boosting was the first really successful boosting algorithm developed for binary classification. Also, it is the best starting point for understanding boosting. Moreover, modern boosting methods build on AdaBoost,
most notably stochastic gradient boosting machines.

Generally, AdaBoost is used with short decision trees. Further, the first tree is created, the performance of the tree on each training instance is used. Also, we use it to weight how much attention the next tree. Thus, it is created should pay attention to each training instance. Hence, training data that is hard to predict is given more weight. Although, whereas easy to predict instances are given less weight.

You can learn more about AdaBoostClassifier here:
https://scikit-learn.org/st...
Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share