🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Modelling a decision tree with very large data?

User: "eldenoso"
New Altair Community Member
Updated by Jocelyn

Hello altogehter,

 

currently I am trying to create decision tree models with large data. The problem which occurs is, that the decision tree either gets to large (wide) or to small, so that accuracy is low and connections can't be identified. I already tried doing different things like discretize numerical attributes etc. But it won't work well. Most of the attributes are of the type nominal, just one is of the numerical type. Contrary to the titanic-example I don't have a label with "yes/no". I already thought that this may cause the problem? 

Thank you for your help! :)

Philipp

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "Telcontar120"
    New Altair Community Member
    Accepted Answer

    A few additional thoughts:

    1. minimal gain for split is a crucial pre-pruning parameter in my experience, so you may want to try a wider range for that and see how it affects your tree
    2. if you have nominal attributes with a lot of distinct values, you should consider consolidation or aggregation of those, since too many individual values can lead to low counts in any particular value
    3. if a flat decision tree isn't working well, you might consider an ensemble model built on trees such as Random Forest or Gradient Boosted Trees