Decision Tree #entropy #criterion #kappa #accuracy
CelineS
New Altair Community Member
Hi guys,
Could anyone explain how to define and detect entropy in DT? ( what are the blue and the red label stands for under the leaf?
Is the 70% accuracy and kappa 0.30ish enough for prediction?
What criterion should I choose for DT '' gain_raio '' or '' information_gain '' to maximise my accuracy and kappa?
regards,
Could anyone explain how to define and detect entropy in DT? ( what are the blue and the red label stands for under the leaf?
Is the 70% accuracy and kappa 0.30ish enough for prediction?
What criterion should I choose for DT '' gain_raio '' or '' information_gain '' to maximise my accuracy and kappa?
regards,
Tagged:
0
Answers
-
Hi there, you have a few questions embedded in your post, so I'll try to comment on most of them.
The blue/red labels under each node indicate the number of examples that fell into each category in that node. The ratio of these forms the basis of the confidence score generated by the DT.
If you want to maximize your tree for accuracy, you can select accuracy directly as the main criterion for tree growth. But it is not possible to say in the abstract whether accuracy of 70% is "good enough" for prediction. In some fields that would be considered great and used with no problem, while in other fields it would be horrible. This question is very domain and dataset specific.
Information gain tends to favor attributes with more categories/specific values, because it is not adjusted for the number of possible distinct values. Information gain ratio adjusts for this, so all else being equal, information gain ratio is probably the more robust criteria between the two (which is why it is the default). If you want to understand how to calculate information gain, the wikipedia article has a good summary: https://en.wikipedia.org/wiki/Information_gain_in_decision_trees
0