Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
How do you reduce variables before doing a decision tree?
Matt_Pilk
Hi!
Just wanted some help.
1) do you need to reduce the number of variables before you execute a decision tree analysis? Currently, i have 19. It makes the decision tree hard to read as i need to go to 12 layers to get the accuracy up.
2) If I use the select attributes for the ones i believe are important after doing some EDA, does this dilute the results or can you pass the decision tree through the original set of data?
any insights from the community would be great.
Thanks,
Matthew
Find more posts tagged with
AI Studio
Decision Tree
Accepted answers
BalazsBaranyRM
Hi!
Decision tree based methods are all about selecting relevant attributes. If you remove attributes before and the tree changes, these would have been relevant and your tree probably got worse. If the tree doesn't change, the removal actually found irrelevant attributes, but that would have been the case through the decision tree anyway.
It is a good idea to do an assessment of your attributes and check them for harmful things like "future" knowledge leaking into the model, or data that are hard to get, or attributes having many missing values. You could remove these manually. But you shouldn't remove attributes on the basis "I don't think these are relevant" before using any method that selects or weights attributes itself. That would be "part human, part machine learning" and it is hard to get better results from this process than from an algorithm that was written for this task.
If your decision tree is too hard to interpret AND interpretability is a more important goal than accuracy, it's better to change the pruning parameters to more strict values. That will give you a smaller, better to understand tree, without sacrificing relevant attributes before its application.
Regards,
Balázs
BalazsBaranyRM
Hi!
It depends on the use case if interpretability of the decision tree is the most important factor.
Usually that's not the case and I use parameter optimization to get the best decision tree or a model from different learning algorithm. (Decision Tree often isn't the best model.)
There's an example building block in the Community Samples repository:
Regards,
Balázs
All comments
BalazsBaranyRM
Hi!
Decision tree based methods are all about selecting relevant attributes. If you remove attributes before and the tree changes, these would have been relevant and your tree probably got worse. If the tree doesn't change, the removal actually found irrelevant attributes, but that would have been the case through the decision tree anyway.
It is a good idea to do an assessment of your attributes and check them for harmful things like "future" knowledge leaking into the model, or data that are hard to get, or attributes having many missing values. You could remove these manually. But you shouldn't remove attributes on the basis "I don't think these are relevant" before using any method that selects or weights attributes itself. That would be "part human, part machine learning" and it is hard to get better results from this process than from an algorithm that was written for this task.
If your decision tree is too hard to interpret AND interpretability is a more important goal than accuracy, it's better to change the pruning parameters to more strict values. That will give you a smaller, better to understand tree, without sacrificing relevant attributes before its application.
Regards,
Balázs
Matt_Pilk
Thanks Balazs. When doing the Decision tree, do you keep extending it until you get an accuracy that you feel is acceptable and explainable or do you go for the highest accuracy even if it is 15-20 levels deep?
BalazsBaranyRM
Hi!
It depends on the use case if interpretability of the decision tree is the most important factor.
Usually that's not the case and I use parameter optimization to get the best decision tree or a model from different learning algorithm. (Decision Tree often isn't the best model.)
There's an example building block in the Community Samples repository:
Regards,
Balázs
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups