Is it correct to use features selection before Gradient Boosted Trees?

f_laperna
f_laperna New Altair Community Member
edited November 2024 in Community Q&A

Hi everyone!
My question is the following:

I'm trying to build some different classification models with different algorithms and techniques and compare the results obtained.

I already built a model using Random Forest and using Bagging technique. In this case, since I had many attributes in my dataset and most of them were almost useless wrt to my target variable classification, I performed a very simple Features Selection by attributes weights. I read in literature that with Bagging is better to perform features selection at each bootstrap.

 

But when using an algorithm as Gradient Boosted Trees which uses Boosting technique to select features subsets which minimize misclassification error, does it make sense to perform a FS before training the model?

I read that some boosted algorithms already contain feature selection and some do not.

 

Hope someone with more knowledge and experience can help me, thank you in advance!

Tagged:

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee

    Hi,

     

    it's statistically sound to do FS for ANY machine learning algorithm. No matter if it's boosted, bagged or plain. If this yields to better accurcacy - go for it.

     

    This said, both RF and GBTs to some FS internally. So FS is not as important for them as it is for others. I would nevertheless still do it.

     

    Cheers,

    Martin

  • kypexin
    kypexin New Altair Community Member

    Hi @f_laperna

     

    I could add from my experience that in some cases, especially if you do some extensive optimization of model parameters, reduction of features might also reduce training time both for RF and GBT and speed up the whole process, so if you have really MANY features and are sure that you can safely omit part of them, why not then.  

     

    Though, in case of BGT, I'd also suggest that you try different feature weighting algorithms, and also consider feature weights that are returned by GBT algorithm itself.