Improve Random forest performance

New Altair Community Member

Jun 1, 2021

Updated Nov 5, 2024 by Jocelyn

Hello!

I'm working on a random forest predictive model to predict a binary label. The dataset is about 70% and 30% unbalanced. The attributes are numeric and represent financial statement indices or amounts in euros such as EBITDA.

The process includes data reading, selection of features with missing value <10%, normalization (Z transformation), replace missing values with the average, cross-validation with undersampling of the majority label class in the training data, RF with information gain ( 200 trees of depth 15).

The performances are not good; accuracy about 74%, recall weighted 75%, precision weighted 72%; f measure 65.89 (class precision primary class 57%)

How can I improve performance? Do you have any suggestions?

Find more posts tagged with

AI Studio

Random Forest

Sort by:

1 - 1 of 11

rfuentealba

New Altair Community Member

Accepted Answer

Jun 29, 2021

Hello, and hopefully it's not too late to answer:

It might be very difficult to answer if we don't know the data, and there might be several strategies. Do you have the possibility of applying some kind of discretization? (converting continuous values into discrete ones or "badges" might help). Do you know if there is any kind of anomaly or trend that might be masked into the data? Those are the ones that I can come up here.

Also, undersampling might sometimes introduce issues, as the data is artificial. Weighting might be better, if your algorithm supports it.

View in context

🎉Community Raffle - Win $25

Improve Random forest performance

Find more posts tagged with

Quick Links