How can I improve the performance of my model with an imbalanced database for a classification issue

Samira_123
Samira_123 New Altair Community Member
edited November 2024 in Community Q&A
Hi,

This is my fist time using RapidMiner. I have to do a classification for an assignment. 
The database is really imbalanced. I have 180 out of 12800 donors who donated (class - 1) in the past and the remaining donors didn't donated (class - 0).

When I created and selected relevant attributes, the class precisions were relevant but the class recall for class 1 was totally irrelevant. I had something close to 8%.

However, when I used the 'Sample' operator to balance my database, the class recall and the class precision were around 60%. I am not sure if it is the right thing to do because at the end, I end up with 360 donors instead of 12 800.

At the end, I have to use a test set of more than 12 000 donors to predict which donor will donate. 

Thank you

NB: My kappa is equal to 0.267

Welcome!

It looks like you're new here. Sign in or register to get started.

Best Answer

  • varunm1
    varunm1 New Altair Community Member
    edited May 2020 Answer ✓
    Hello @Samiraaa_123

    Whats your kappa value? And what did you apply? Also, there is no guarantee you always get excellent results with ML, some times the data might be random or you may not be tuning your hyperparameters well. You need to keep trying using different models and tuning their hyperparameters and you also need to understand data by checking correlations, distributions. 

Answers

  • varunm1
    varunm1 New Altair Community Member
    Hello @Samiraaa_123

    How are you validating your model? Is it cross-validation or split validation? 

    Sampling is good when it is applied to the training set. It is not recommended to apply sampling on the whole dataset. As the dataset is small, you can try upsampling your minority class using SMOTE operator present in the Operator toolbox (Download from Marketplace) instead of downsampling. Also, you can try weighting your examples instead of sampling, this word only for few algorithms like neural networks, decision trees, etc,. Weighting doesn't alter your sample-sizes but assigns equal importance to both classes. This can be done using Generate weights (Stratification). You should check if the algorithm you are trying to use will accept this weighting. That can be found by right-clicking on the algorithm operator and then click on Show operator info. There if you see a green tick after "Weighted Examples" then that algorithm is fine for weighting.

    Are your tuning the model's hyperparameters? Are you trying different algorithms?
  • Samira_123
    Samira_123 New Altair Community Member
    edited May 2020
    Hi @varunm1  

    I used the 'Cross Validation' operator to validate my model. I tried to balance my dataset by using the Generate weights (Stratification) before since I saw this could work on the forum but it says that the 'Random Forest' (operator I am using for the classification) will disregard that. 

    Does the SMOTE operator need to be placed just before the cross-validation? 

    Thank you so much for your answer
  • varunm1
    varunm1 New Altair Community Member
    Yep, Random forest doesn't accept weights. You should apply SMOTE or any sampling operators in the training part of the cross-validation. If you apply on whole data, it will bias your model and this model doesn't scale for new data that might come in the future. You can also apply the Optimize Parameter (Grid) to search for good hyperparameter (number of trees, maximal depth, etc.) for the random forest. Also try different models like gradient boosting, neural networks, SVM etc.
  • Samira_123
    Samira_123 New Altair Community Member
    Hi @varunm1,

    Thank you for your advice :) 
    I've been trying to do what you said. My class precision is really good but the class recall for the class 1 is irrelevant. 
  • varunm1
    varunm1 New Altair Community Member
    edited May 2020 Answer ✓
    Hello @Samiraaa_123

    Whats your kappa value? And what did you apply? Also, there is no guarantee you always get excellent results with ML, some times the data might be random or you may not be tuning your hyperparameters well. You need to keep trying using different models and tuning their hyperparameters and you also need to understand data by checking correlations, distributions. 
  • Samira_123
    Samira_123 New Altair Community Member
    Hi @varunm1,

    Thank you. You were very helpful. 
    Wish you a good weekend :)

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.