Gradient Boosted Tree and performance

Barborka
Barborka New Altair Community Member
edited November 2024 in Community Q&A
Dear community,

I want to understand my GBT algorithm. I trained it, validated it on new data with quite a good result. Now, I would like to understand the model to find out, which attributes were the most decisive ones, but here I fail. For example, my Tree 1 is described as

ch1 in {1009351207,1047831207,... (46 more)}: 0.013 {}

ch1 not in {1009351207,1047831207,... (46 more)}

|   ch1 in {1009351207,1000751092,... (49 more)}: -0.009 {}

|   ch1 not in {1009351207,1000751092,... (49 more)}: -0.027 {}


Could you please, explain, where can I find these 46 more atributes? Or 49 more attributes?


Thanks a lot.


Tagged:

Best Answer

  • BalazsBaranyRM
    BalazsBaranyRM New Altair Community Member
    Answer ✓
    Hi @Barborka,

    if you're looking at the description of one tree and it only contains ch1, then it only considers ch1. Other trees might consider different attributes. The weights output of the entire model shows the summary - single trees are not that relevant.

    I couldn't find a way to extract the whole list of values going into the rules. There are some promising operators like Tree to Rules and DecisionTree to ExampleSet (in the Converters extension) but these don't work with GBT, only single trees.

    Regards,
    Balázs

Answers

  • BalazsBaranyRM
    BalazsBaranyRM New Altair Community Member
    Hi @Barborka,

    with a complex model like GBT it's very complicated to derive the attribute importance directly from the model.
    In your example ch1 is the attribute name, the 1009... (46 more) entries are different values (data in the ch1 column). 

    So in this example only the attribute ch1 is relevant at all. 
    The Gradient Boosted Trees operator has an output called "wei". These are the attribute weights calculated by the model. Higher values in this table mark the more important attributes for predicting the label.

    If I saw a model like this, I would suspect that these are IDs and the model is just learning them. This would mean that the model is overfitted. I hope this is not the case with your data, but you should check.

    Regards,
    Balázs
  • Barborka
    Barborka New Altair Community Member
    Dear @BalazsBarany thanks for reply. In other trees, I also have ch2 and ch3, for example, I just entered the first one. Is that possible, that ch2 and ch3 are not considered in this tree?

    and, is there any possibility how to find, which exact values are in these 46 more (and 49 more,etc.)? {1009351207,1047831207,... (46 more)}

    Btw., these are not IDs.


  • BalazsBaranyRM
    BalazsBaranyRM New Altair Community Member
    Answer ✓
    Hi @Barborka,

    if you're looking at the description of one tree and it only contains ch1, then it only considers ch1. Other trees might consider different attributes. The weights output of the entire model shows the summary - single trees are not that relevant.

    I couldn't find a way to extract the whole list of values going into the rules. There are some promising operators like Tree to Rules and DecisionTree to ExampleSet (in the Converters extension) but these don't work with GBT, only single trees.

    Regards,
    Balázs
  • Barborka
    Barborka New Altair Community Member
    Dear @BalazsBarany , thanks for your help. I will try something different then, maybe python.