🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Gradient Boosted Tree Algorithm performance

User: "varunm1"
New Altair Community Member
Updated by Jocelyn
I am working with Gradient boosted tree (GBT), and it performs better (5-Fold CV) on most of my datasets with high metrics like AUC (1.0), kappa (0.971), etc. I can correlate the results with the capabilities of GBT like regularization and sequential learning. I even set aside 30 percent data for testing after five-fold cross-validation and got kappa (0.974) for this unseen data.
 My question is, are there any cautions or factors that need to be considered while using and interpreting results of a GBT and how good is GBT in real applications? 

Thanks

Find more posts tagged with

Sort by:
1 - 2 of 21
    User: "Telcontar120"
    New Altair Community Member
    Accepted Answer
    GBTs are great in terms of predictions.  In terms of interpretability, I think they are somewhat harder because the trees are boosted and not independent (so less interpretable than an Random Forest, in my view).  But as long as you are using other ways to communicate model results (including some of the great tools in RapidMiner like simulation and explaining predictions) then they are fine.
    You did mention AUC of 1.0 and that is pretty much perfect separation, so also make sure that you don't have any data leakage or sample contamination going on.  Nothing is worse than deploying a model in production and watching its performance collapse!
    User: "MartinLiebig"
    Altair Employee
    Accepted Answer
    sorry i am a bit busy. But to clarify: Are you sure that each ID is really independed from the other? These are really different customers or different machines etc? These are NOT correlated examples like the same customer in different years or an item generated in the same batch than others?
    Best,
    Martin