Decision Trees

You are currently auditing this course.
1 / 31

Decision Trees


Slides

Download the slides


No hints are availble for this assesment

Answer is not availble for this assesment

Please login to comment

41 Comments

Could you please elaborate the below line 

"However, Gini impurity tends to isolate the most frequent class in its own branch of the tree"

  Upvote    Share

Hi Veena,

The statement "Gini impurity tends to isolate the most frequent class in its own branch of the tree" means that when constructing a decision tree using Gini impurity as the splitting criterion, the algorithm tends to create splits that quickly separate the dominant class (the one that occurs most frequently in the dataset) into its own branch. This happens because Gini impurity measures how often a randomly chosen element would be incorrectly classified, and it prefers splits that reduce this impurity as much as possible. When there is a class that occurs frequently, the algorithm finds it more effective to separate that class early, which reduces the overall Gini impurity for subsequent splits. This behavior can lead to shorter, simpler branches for the dominant class while the less frequent classes may require more splits to isolate accurately.

  Upvote    Share

This comment has been removed.

Hi
I tried the same code on a database that I made myself, but it gives an impression that someone can help me.

I do not know why it is a pandas and how it should be and how to change it?
thanks a lot

  Upvote    Share

Hi, Can you share some instances of target variable yy.

  Upvote    Share

Thank you for your guidance
My data has similar properties to Iris, except that the target is continuous
The numbers are continuous because of these conjunctions
I gave the target a point in a certain limit so that the putty tree could predict the decision, but again it gave a snort
Thank you

  Upvote    Share

Hi, Can you display some instances of the target variable?

  Upvote    Share

This comment has been removed.

This comment has been removed.

I found tnx

  Upvote    Share

!dot -Tpng iris_tree.dot -o iris_tree.png

 is working only in CloudxLab & not working in PC local jupyter programm. What is the alternate code for that?

  Upvote    Share

Hi,

You need to install GraphViz first in your local computer before you can use it. Here are the details on how you can install the same:

https://graphviz.org/download/

Thanks.

  Upvote    Share

In real life, what does the rotation of data mean? What action with data might lead to it being rotated while plotting?

  Upvote    Share

Hi,

Good question! You will notice that decision boundaries made by a decision tree are parallel to the axis.The reason for this is that a Decision tree splits the data based on a feature value and this value would remain constant throughout for one decision boundary. It is because of this that they are sensitive to data rotation. Data that could easily be split by a single diagonal is split by multiple decision boundaries, a solution would be to have a simpler model that would generalize better for this particular problem.

Thanks.

  Upvote    Share

Sir, what you've said, that I've understood, but what I don't understand is what may cause the data to rotate. What is the real life implication of data being rotated. Also, how and why does one rotate the data? 

  Upvote    Share

Hi,

The real life implecation of data rotation is that the model becomes unstable. To solve this problem, one can use dimensionality reduction techniques like PCA which orients the data better. You can also use Random Forest model instead of Decision Tree. Rotation basically means that the data changes shape/side with respect to the feature axes. You would be able to understand this better if you visualize the data as a matrix.

Thanks.

  Upvote    Share

When we talk about the target average values in DT, what measure of average is used by default? Mean or Median? if Mean, then why not median (as it'll handle noise better than mean)

 1  Upvote    Share

This comment has been removed.

Hi,

Good question!

Mean is used since computing median requires sorting of the values which is more process intensive. Also, the cases where median is a better measure of center is when the data has skewed data points. In the decision tree, the datapoints that are part of the leaf are most likely not have the outlier because the outlier would land in other leaf.

Hope this clarifies your doubts.

Thanks.

  Upvote    Share

On calculating the Entropy for node at depth 2 p_{i,k} does not equal to zero. but my values are [0, 49, 5] 

so according to the formula P_{2,1} = 0. 

Please explain.

  Upvote    Share

Hi,

Please check our notebook from our GitHub repository for the complete code:

https://github.com/cloudxlab/ml

Thanks.

  Upvote    Share

I did not understand

at 2:13

let say we are taking the value of x=0.9 then how the value of y is 0.6 in the graph max_depth=0??

 

 

Please reply

 

 

 

 

 

 

 

  Upvote    Share

Hi,

Could you please tell me which part of the video are you referring to since I could not find any such reference at 2:!3 timestamp.

Thanks.

  Upvote    Share

1.On splitting of a branch whether 1 feature or multiple featues are considered?Is that what max_features signify?

2.Suppose the algorithum spits based on feature 1 and its threshold ,then in further nodes will the algorithum exclude Feature 1 as it was already considered and a branch was formed based on it?

  Upvote    Share

Hi,

1. max_features is the number of features to consider when looking for the best split

2. Please refer to slide 15 for a better understanding of this concept. In slide 15 it is shown that at first we are considering the feature petal length, in the second node we are considering another feature, which is petal width.

Thanks.

  Upvote    Share

This comment has been removed.

This comment has been removed.

I can not find decision tree jupyter notebook

  Upvote    Share

Hi,

You can find the notebooks for Decision Trees in our GitHub repository:

https://github.com/cloudxlab/ml/tree/master/machine_learning

Thanks.

  Upvote    Share

How can we decide what max_depth to pass as the hyperparameter without knowing how many nodes it will create?
Is it possible through GridsearchCV or RandomSearchCV?

  Upvote    Share

Hi,

In general, the deeper you allow your tree to grow, the more complex your model will become because you will have more splits and it captures more information about the data and this is one of the root causes of overfitting in decision trees because your model will fit perfectly for the training data and will not be able to generalize well on test set.

It is also bad to have a very low depth because your model will underfit so how to find the best value, experiment because overfitting and underfitting are very subjective to a dataset, there is no one value fits all solution.
So what you can do is, let the model decide the max_depth first and then by comparing the train and test scores, look for overfitting or underfitting and depending on the degree decrease or increase the max_depth.

Thanks.

-- Rajtilak Bhattacharjee

 1  Upvote    Share

How can we plot the graph for Decision Tree's decision boundaries as given through slide numbers 22-24. Please share the function used to plot the same.
Thank You.

  Upvote    Share

Hi,

Here's an article on how to plot the DT's decision boundaries.

https://scikit-learn.org/st...

Thanks.

-- Rajtilak Bhattacharjee

  Upvote    Share

You didn't really explain that what's the benefits of the decision tree over linear regression or nonlinear regression model?

  Upvote    Share

Basically decision tree is a non-parametric algorithm while linear or non-linear regression are parametric. What this actually means is that the parametric models are going to assume a shape for the function that maps the input variables to the target variables (e.g. if you plot the function the plot is linear or exponential, etc.). So there is a possibility of underfitting for data that is non-linear. But, non-parametric model, since it does not assume any shape for the mapping function can fit the dataset very closely, nonetheless, the risk of overfitting cannot be denied for such models (which can be dealt with some hyperparameter tuning).

Having said that, if there is less no. of instances and large no. of features with very less noise, linear models could outperform the tree models. But, in general decision trees tend to perform better.

I hope that helps.

 1  Upvote    Share

Here is a problem , you said the Gini is computationally faster than the entropy criteria , but you also said that Entropy created the more balanced tree. but if the more balanced tree is less computationally intensive , how can it be slower than the Gini ?? this is contradictory....

  Upvote    Share

More balanced tree does prediction faster, but more computationally intensive to train it, Gini is faster to train but a little bit slow in prediction than Entropy because the tree length is very high in Gini.

 1  Upvote    Share

When we should use ROC AUC Curve?
When we should use Precision Recall Curve? Which one is moslty preferable?

When we need high precision, low recall?
When we need low precision and high recall?

Explain all above with real world scenarios..

Thanks in advance

  Upvote    Share

Hi, Vinod.
Good questions!

AUC provides an aggregate measure of performance across all possible classification thresholds. One way of interpreting AUC is as the probability that the model ranks a random positive example more highly than a random negative example.

Please refer to this link for more details :- https://machinelearningmast...

All the best!

  Upvote    Share

In brief:

1. ROC AUC curve is usefil when we are dealing with a fairly balanced dataset i.e. number of instances for each class are approximately equal

2. Precision Recall curve should be used when the dataset is imbalanced

3. Precision = True Positive / (True Positive + False Positive)

    i.e. Precision indicates out of all the instances that have been predicted as positive how many are actually positive

    Recall = True Positive / (True Positive + False Negative)

    i.e. Recall indicates indicates out of all the instances that were actually positive how many were predicted as positive

So, the use depends upon the problem statement.

For example, if you are predicting cancer for a sample of the population it matters more that the person is not wrongly diangnosed of not having cancer, but it can be tolerated if he is misdiagnosed of having cancer, further tests will make take care of it. In this case, you need to have low false negatives and so higher Recall is important here.

Another example, if suppose you are trying to identify dogs from other animals, it is important to you that only dogs are identified as dogs and not any other animal, it would be okay if a dog is identified as a cat in some instance. In this case you want to reduce the false positives, so higher Precision is important here. 

Referring to the lecture videos on classification would clear you concept as it has been explained with good example and shall also supplement for the above two points

 1  Upvote    Share

I'll add a couple of things to my previous answer.

Recall is also called Sensitivity.

And there's Specificity corresponding to it

Specificity = True Negatives / (True Negatives + False Positives)

For example, when you are predicting the presence of heart disease for people.

If the presence of heart disease is important to you, you'll try to increase Sensitivity as the goal will be to reduce the False negatives.

But, if the absence of heart disease is more important to you, you'll try to increase the Specificity as the goal will be to reduce the False positives

 1  Upvote    Share