Ensemble Learning and XGBoost

You are currently auditing this course.

3 / 42

Previous Index Next

Ensemble Learning Part -3

Slides

Download the slides

Previous Index Next

Please login to comment

42 Comments

Jay Shah

a year ago

I somehow do not see the ml folder anywhere where gradient boosting codes are written. The only folder i see says "cloudx jupyter notebooks" with various end to end projects on it. Am I missing anything here?

Upvote Share

Shubh Tripathi

a year ago

You'll need to clone the "ml" repo. The repo is present at https://github.com/cloudxlab/ml

Upvote Share

Sameer Sippy

4 years ago

Dear CloudXLab Team,

While going throught the PP, in slides 108 & 109, I perceive there exists a minor error.

a) In the discussion on Page 108, it is mentioned low learning rate which implies that it is 0.1. However, the corresponding illustration/diagram showcases learning_rate=1.0.

b) Similarly on page 109, in the discussions learning rate is mentioned as 1. However the corresponding figure/illustration/diagram showcases learning_rate = 0.1.

Maybe the slides contain the aforesaid error or either I am wrong.

Kindly clarify.

Upvote Share

Rajtilak Bhattacharjee

4 years ago

Hi,

Good catch!

Actually, here we are not trying to show the learning rate but the predictors. The diagram on 108 shows not enough predictors, the one on 109 shows too many predictors (about 200). Both the learning rates are same, which is 0.1. Thank you for pointing this out, we would update the same when we are updating our courseware.

Thanks.

1 Upvote Share

Sameer Sippy

4 years ago

Appreciate Rajtilak for letting me know...Indeed it does help in understanding the core aspects of the ML concepts including the fundamentals of the theory attached to it....!!!

Upvote Share

Rajtilak Bhattacharjee

4 years ago

Always happy to help!

Upvote Share

Sameer Sippy

4 years ago

Hi CloudXLab Support Team,

Greetings!!! Have a doubt in slide number 84.

It is mentioned that when the learning rate is halved i.e. from 0 to -0.5 (learning rate = -0.5), the Weights of Misclassified instances/rows/observations, get boosted with every iteration. This learning rate is derived from Stochastic Gradient Descent (SGD) Algorithm.

By now my counter query or doubt is ---a) Why should the learning rate be halved?

b) What is the rationale behind the learning rate being halved?

Could you explain this concept and share some more gyaan regarding the same?

Upvote Share

Rajtilak Bhattacharjee

4 years ago

Hi,

Here, the misclassified instance weights are boosted half as much at every iteration. However, if you look at the notebook, I am sure you would find the code for this. Try reducing the learning rate gradually and see how the output differs.

Thanks.

Upvote Share

Sameer Sippy

4 years ago

Regarding the Learning Rate, let me take this opportunity to explain briefly---

1) Learning Rate is basically a hyperparameter of SGD (Stochastic Gradient Descent) algorithm. It is basically a cost function invoked to minimize the loss during the training of Sampling dataset.

2) It helps in controlling the Model for the purpose of making predictions.

3) By default, the value of Learning rate is 1. This is relevant & true for AdaBoost Ensemble ML Algorithm.

4) Learning rate values lie between 0 & 1.

5) While commencing with learning rates, we can start with either with 0.1 or 0.01 or 0.001 as the values.

6) However, with smaller values in learning rate such as 0.1 or even 0.01...etc, the training of the algorithm considerably slows down. In other words, it takes a longer time for the outcomes to converge

7) Such lower values have the tendency of Overfitting too.

8) With Higher learning rate i.e. 1 or closer to it, there is always a risk of overshooting the minima.As a result, it will result in huge divergence or variance in the outcomes rather than the convergence of values.

9) The learning rate stops once an optimal rate is reached (or hovers arounda particular level for atleast 3 or 5 iterations.

Upvote Share

Sameer Sippy

4 years ago

Hi CloudX Lab Team,

A Doubt -----Accuracy of the Predictors ---

When there is discussion on Accuracy of Predictors, is it that these values lie between 0 & 1???

For Higher Accuracy - Values are closer to 1

For Lower Accuracy, - Values are closer to 0.

For Misclassified Predictors - Values are in the Negative.

Kindly correct me if my aforesaid understanding is wrong.

Upvote Share

Rajtilak Bhattacharjee

4 years ago

Hi,

Yes, accuracy for predictors are between 0 to 1.

Thanks.

Upvote Share

Sameer Sippy

4 years ago

Thanks for the confirmation...!!!

Upvote Share

Sameer Sippy

4 years ago

Hi CloudX Lab team,

Some more doubts ---

a) As per slide 83, "the first classifier may get many instances wrong" as stated.

So is this First Classifier also known as a Base classifier on Ensemble ML Algorithm &/or of a Decision Tree ML Algorithm?

b) If instances/rows/observations are Wrongly Classified (slide 83), can they be also known as "Weak Learners" (the terminology as stated in Slide 76)?

c) Are these Predictors also Independent (or x) variables?

Would like to know what I understood is correct or off the mark? Kindly correct me if my understanding of these 3 statements are wrong.

Upvote Share

Rajtilak Bhattacharjee

4 years ago

Hi,

a. Yes, it is also known as the base classifier.

b. Yes, they can also be known as weak learners.

c. You can go through the below link (which is given a few slides down) for a broader explanation on the workings of AdaBoost:

http://mccormickml.com/2013/12/13/adaboost-tutorial/

Also, there is this article from one of the original author of AdaBoost:

http://rob.schapire.net/papers/explaining-adaboost.pdf

Hope these will answer your queries. If not, please let me know.

Thanks.

Upvote Share

Sameer Sippy

4 years ago

Appreciate Rajtilak for your clarifications in response to my queries or doubts.

Moreover, the links shared for additional references to AdaBoosting Ensemble ML Algorithm is indeed helpful!!!

Upvote Share

This comment has been removed.

Sameer Sippy

4 years ago

Dear Team,

a) Bagging - Sampling with Replacement which is akin to BootStrapping technique or also called as Bootstrap Aggregating technique.

b) Pasting - Sampling without Replacement

Both these techniques are committed or implemented on the Original Dataset i.e. Sample. Also, it is possible to repeat or create such random samples multiple number of times which is also called as Simulation. This forms part of Simulation Statistics.

From what I understand and decipher, as Bagging & Boosting are instances taken from the original samples used for ML Techniques - Ensemble Methods, wouldn't it be apt to call both the aforesaid techniques (i.,e Bagging & Boosting) as Resampling Techniques???

Kindly correct me if my understanding is wrong.

Upvote Share

Rajtilak Bhattacharjee

4 years ago

Hi,

Bagging and pasting are resampling techniques.

Thanks.

Upvote Share

Sameer Sippy

4 years ago

Appreciate regarding the clarifications given...It indeed helps...!!!

Upvote Share

This comment has been removed.

Sameer Sippy

4 years ago

HI CloudXLab Team,

There is an error in page 38 of the slide - Pasting

Since we are sampling without replacement bag blue cannot have blue or red balls as all the balls are previously used.

In the aforesaid case, the phrase "bag blue" should be replaced with "bag 2".

Kindly rectify this error..Hope this helps!!!

Upvote Share

Rajtilak Bhattacharjee

4 years ago

Hi,

That is correct! Thank you for pointing this out.

Thanks.

Upvote Share

Sameer Sippy

4 years ago

With Pleasure Rajtilak!!!

1 Upvote Share

Gopal Krishna

4 years ago

Hi,

Using Moons dataset, I tried Voting Classifiers by specifying voting='hard'. it worked perfectly with accuarcy score 0.88. But mentioned, if I am specifying voting='soft'. it is throwing error that "predict_proba is not available when probability=False".

Please help.

Soft Voting throws error

Upvote Share

Vagdevi K

4 years ago

Hi,

Please share the screenshot of the error traceback.

Thanks.

Upvote Share

Gopal Krishna

4 years ago

Upvote Share

Rajtilak Bhattacharjee

4 years ago

Hi,

Could you please set the probability=True in SVC and try again?

Thanks.

1 Upvote Share

This comment has been removed.

Gopal Krishna

4 years ago

Thank you Mr. Bhattacharjee. It worked for me.

1 Upvote Share

Rajtilak Bhattacharjee

4 years ago

Great! Happy learning.

Upvote Share

Aritra Bose

4 years ago

Can the blending technique be implemented after bagging/pasting technique? Or it is applicable for gradient boosting only?

Upvote Share

Rajtilak Bhattacharjee

4 years ago

Hi,

Could you please tell me which blending technique are you referring to?

Thanks.

Upvote Share

Aritra Bose

4 years ago

My ask was in general, if we can apply stacking technique to any bagging/pasting technique, not for any specific technique.

Upvote Share

Rajtilak Bhattacharjee

4 years ago

Hi,

You can try going through the following paper which discussed about using staking with bagging:

https://www.researchgate.net/publication/2695910_Combining_Stacking_With_Bagging_To_Improve_A_Learning_Algorithm

Next, you can go through the below discussion to know more about stacking, bagging, and boosting:

https://stats.stackexchange.com/questions/18891/bagging-boosting-and-stacking-in-machine-learning

Thanks.

Upvote Share

Sindhu Tirth Sahoo

4 years ago

At 47:10 it's a generator expression without the square brackets, generator expressions create generator objects which is quite useful especially when we are passing theminto functions like sum() as generator expressions are faster than list expressions as they do not have to create and store the entire list into memory

1 Upvote Share

Rajtilak Bhattacharjee

4 years ago

Thank you for sharing this.

Upvote Share