Ensemble Learning and XGBoost

You are currently auditing this course.
3 / 42

Ensemble Learning Part -3






Slides

Download the slides


Please login to comment

42 Comments

I somehow do not see the ml folder anywhere where gradient boosting codes are written. The only folder i see says "cloudx jupyter notebooks" with various end to end projects on it. Am I missing anything here?

  Upvote    Share

You'll need to clone the "ml" repo. The repo is present at https://github.com/cloudxlab/ml

  Upvote    Share

Dear CloudXLab Team,

While going throught the PP, in slides 108 & 109, I perceive there exists a minor error.

a) In the discussion on Page 108, it is mentioned low learning rate which implies that it is 0.1. However, the corresponding illustration/diagram showcases learning_rate=1.0.

b) Similarly on page 109, in the discussions learning rate is mentioned as 1. However the corresponding figure/illustration/diagram showcases learning_rate = 0.1.

Maybe the slides contain the aforesaid error or either I am wrong.

Kindly clarify.

  Upvote    Share

Hi,

Good catch!

Actually, here we are not trying to show the learning rate but the predictors. The diagram on 108 shows not enough predictors, the one on 109 shows too many predictors (about 200). Both the learning rates are same, which is 0.1. Thank you for pointing this out, we would update the same when we are updating our courseware.

Thanks.

 1  Upvote    Share

Appreciate Rajtilak for letting me know...Indeed it does help in understanding the core aspects of the ML concepts  including the fundamentals of the theory attached to it....!!!

  Upvote    Share

Always happy to help!

  Upvote    Share

Hi CloudXLab Support Team,

Greetings!!! Have a doubt in slide number 84.

It is mentioned that when the learning rate is halved i.e. from 0 to -0.5 (learning rate = -0.5), the Weights of Misclassified instances/rows/observations, get boosted with every iteration. This learning rate is derived from Stochastic Gradient Descent (SGD) Algorithm.

By now my counter query or doubt is ---a) Why should the learning rate be halved?

                                                                b) What is the rationale behind the learning rate being halved?

Could you explain this concept and share some more gyaan regarding the same?

  Upvote    Share

Hi,

Here, the misclassified instance weights are boosted half as much at every iteration. However, if you look at the notebook, I am sure you would find the code for this. Try reducing the learning rate gradually and see how the output differs.

Thanks.

  Upvote    Share

Regarding the Learning Rate, let me take this opportunity to explain briefly---

1) Learning Rate is basically a hyperparameter of SGD (Stochastic Gradient Descent) algorithm. It is basically a cost function invoked to minimize the loss during the training of Sampling dataset.

2) It helps in controlling the Model for the purpose of making predictions.

3) By default, the value of Learning rate is 1. This is relevant & true for AdaBoost Ensemble ML Algorithm.

4) Learning rate values lie between 0 & 1.

5) While commencing with learning rates, we can start with either with 0.1 or 0.01 or 0.001 as the values.

6) However, with smaller values in learning rate such as 0.1 or even 0.01...etc, the training of the algorithm considerably slows down. In other words, it takes a longer time for the outcomes to converge

7) Such lower values have the tendency of Overfitting too.

8) With Higher learning rate i.e. 1 or closer to it, there is always a risk of overshooting the minima.As a result, it will result in huge divergence or variance in the outcomes rather than the convergence of values.

9) The learning rate stops once an optimal rate is reached (or hovers arounda particular level for atleast 3 or 5 iterations.

  Upvote    Share

Hi CloudX Lab Team,

 

A Doubt -----Accuracy of the Predictors ---

When there is discussion on Accuracy of Predictors, is it that these values lie between 0 & 1???

For Higher Accuracy - Values are closer to 1

For Lower Accuracy, - Values are closer to 0.

For Misclassified Predictors - Values are in the Negative.

Kindly correct me if my aforesaid understanding is wrong.

  Upvote    Share

Hi,

Yes, accuracy for predictors are between 0 to 1.

Thanks.

  Upvote    Share

Thanks for the confirmation...!!!

  Upvote    Share

Hi CloudX Lab team,

Some more doubts ---

a) As per slide 83, "the first classifier may get many instances wrong" as stated.

So is this First Classifier also known as a Base classifier on Ensemble ML Algorithm &/or of a Decision Tree ML Algorithm?

 

b) If instances/rows/observations are Wrongly Classified (slide 83), can they be also known as "Weak Learners" (the terminology as stated in Slide 76)?

c) Are these Predictors also Independent (or x) variables?

Would like to know what I understood is correct or off the mark? Kindly correct me if my understanding of these 3 statements are wrong.

 

 

  Upvote    Share

Hi,

a. Yes, it is also known as the base classifier.

b. Yes, they can also be known as weak learners.

c. You can go through the below link (which is given a few slides down) for a broader explanation on the workings of AdaBoost:

http://mccormickml.com/2013/12/13/adaboost-tutorial/

Also, there is this article from one of the original author of AdaBoost:

http://rob.schapire.net/papers/explaining-adaboost.pdf

Hope these will answer your queries. If not, please let me know.

Thanks.

  Upvote    Share

Appreciate Rajtilak for your clarifications in response to my queries or doubts.

Moreover, the links shared for additional references to AdaBoosting Ensemble ML Algorithm is indeed helpful!!!

  Upvote    Share

This comment has been removed.

This comment has been removed.

Dear Team,

a) Bagging - Sampling with Replacement which is akin to BootStrapping technique or also called as Bootstrap Aggregating technique.

b) Pasting - Sampling without Replacement

Both these techniques are committed or implemented on the Original Dataset i.e. Sample. Also, it is possible to repeat or create such random samples multiple number of times which is also called as Simulation. This forms part of Simulation Statistics.

From what I understand and decipher, as Bagging & Boosting are instances taken from the original samples used for ML Techniques - Ensemble Methods, wouldn't it be apt to call both the aforesaid techniques (i.,e Bagging & Boosting) as Resampling Techniques???

Kindly correct me if my understanding is wrong.

  Upvote    Share

Hi,

Bagging and pasting are resampling techniques.

Thanks.

  Upvote    Share

Appreciate regarding the clarifications given...It indeed helps...!!!

  Upvote    Share

This comment has been removed.

HI CloudXLab Team,

There is an error in page 38 of the slide - Pasting

Since we are sampling without replacement bag blue cannot have blue or red balls as all the balls are previously used.

In the aforesaid case, the phrase "bag blue" should be replaced with "bag 2".

Kindly rectify this error..Hope this helps!!! 

  Upvote    Share

Hi,

That is correct! Thank you for pointing this out.

Thanks.

  Upvote    Share

With Pleasure Rajtilak!!!

 1  Upvote    Share

Hi,

Using Moons dataset, I tried Voting Classifiers by specifying voting='hard'. it worked perfectly with accuarcy score 0.88. But mentioned, if I am specifying voting='soft'. it is throwing error that "predict_proba is not available when probability=False".

Please help.

Soft Voting throws error

 

  Upvote    Share

Hi,

Please share the screenshot of the error traceback.

Thanks.

  Upvote    Share

Hi,

Could you please set the probability=True in SVC and try again?

Thanks.

 1  Upvote    Share

This comment has been removed.

Thank you Mr. Bhattacharjee. It worked for me.

 1  Upvote    Share

Great! Happy learning.

  Upvote    Share

Can the blending technique be implemented after bagging/pasting technique? Or it is applicable for gradient boosting only? 

  Upvote    Share

Hi,

Could you please tell me which blending technique are you referring to?

Thanks.

  Upvote    Share

My ask was in general, if we can apply stacking technique to any bagging/pasting technique, not for any specific technique.

  Upvote    Share

Hi,

You can try going through the following paper which discussed about using staking with bagging:

https://www.researchgate.net/publication/2695910_Combining_Stacking_With_Bagging_To_Improve_A_Learning_Algorithm

Next, you can go through the below discussion to know more about stacking, bagging, and boosting:

https://stats.stackexchange.com/questions/18891/bagging-boosting-and-stacking-in-machine-learning

Thanks.

  Upvote    Share

At 47:10 it's a generator expression without the square brackets, generator expressions create generator objects which is quite useful especially when we are passing theminto functions like sum() as generator expressions are faster than list expressions as they do not have to create and store the entire list into memory

 1  Upvote    Share

Thank you for sharing this.

  Upvote    Share

Can I get the link for the slides used by instructor of XGBoost by Tianqi Chen.

 1  Upvote    Share

If probability range fro 0 to 1 how at 2:06:39 prob can be 2.9 and -1.9 ?

  Upvote    Share

Hi,

That is the prediction, which is the sum of all the probabilities.

Thanks.

  Upvote    Share

Got it! Thanks

HS

  Upvote    Share