🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Optimizing speed of GBT model

User: "vbs2114"
New Altair Community Member
Updated by Jocelyn
I am using a gradient boosted tree model to do my analysis with a lot of textual fields that are broken down from a Redshift database and used as categorical features to predict a classification of a row. Do you have any general tips or tricks for making a predictive model run faster without loosing quality of the predictions? Playing around with different tree/depth settings or configurations? Right now to read-train-run model-update database, it takes around 1 hr (for 10,000 rows), if that could be cut in half that would be amazing.

Find more posts tagged with

Sort by:
1 - 2 of 21
    User: "varunm1"
    New Altair Community Member
    Updated by varunm1
    Hello @vbs2114

    1 hr for 10000 rows seem to be long but it depends on many factors. What's the tree depth and number of trees you are building? Do you have huge number of dimensions (attributes, columns)?

    You should also focus on learning rate. If the learning rate is too small the computational load is really high but the models are better

    Thanks
    User: "SGolbert"
    New Altair Community Member

    If it takes that long, it means that you have thousends of features. In my opinion you have two posibilities:

    1. You can try to improve your process by doing feature selection, which will reduce the number of features and thus the training time.
    2. If all features are important, you can use dimension reduction techniques. For categorical features you would have to use a correspondence analysis, currently supported only with scripts.

    I have only done 2) for analysis purposes, I don't know how good it can be adapted for dimensionality reduction in ML.

    Kind regards,
    Sebastian