"Increase of minimum leaf size in Decision Tree"
b00122599
New Altair Community Member
Hey folks,
I have increase the minimum leaf size in my decision tree, this has result in a smaller more readable tree, but a small decrease in accuracy. I'm being asked what this says about my dataset I'm presuming I'm overfitting the data but I'm not sure. Would anyone have any idea?
Thanks in advance,
Neil.
I have increase the minimum leaf size in my decision tree, this has result in a smaller more readable tree, but a small decrease in accuracy. I'm being asked what this says about my dataset I'm presuming I'm overfitting the data but I'm not sure. Would anyone have any idea?
Thanks in advance,
Neil.
Tagged:
0
Best Answer
-
Hi,Not necessarily. Increasing the leaf size is just a different way of pruning the tree. The goal is to find a good balance between generalizing from your training data without missing the underlying patterns.I am assuming that you refer to a properly validated test accuracy on an independent data set (e.g. by using cross validation) here. If this is the case, then this reduction in accuracy is actually not a sign that you have been overfitting before you made the change, but that you now start to miss some of the valid patterns in your data.Please also note that changes in accuracy may not be significant at all. And that there are other criteria for good models (like understandability), so you may even want to go with a less accurate but more understandable model.Hope those thoughts helps a bit,
Ingo1
Answers
-
Hi,Not necessarily. Increasing the leaf size is just a different way of pruning the tree. The goal is to find a good balance between generalizing from your training data without missing the underlying patterns.I am assuming that you refer to a properly validated test accuracy on an independent data set (e.g. by using cross validation) here. If this is the case, then this reduction in accuracy is actually not a sign that you have been overfitting before you made the change, but that you now start to miss some of the valid patterns in your data.Please also note that changes in accuracy may not be significant at all. And that there are other criteria for good models (like understandability), so you may even want to go with a less accurate but more understandable model.Hope those thoughts helps a bit,
Ingo1 -
Thank you very much for your reply you're very kind.0