Balanced classes in a unbalanced dataset with multiple classes

Liza123
Liza123 New Altair Community Member
edited November 5 in Community Q&A
Hi all, 

I am new on this platform and I am struggling with balancing the classes.
When I create a model for my binary dataset I can use the sample operator or the SMOTE upsampling operator to balance my classes.
When I run a model with three (or more) classes the sample or SMOTE upsampling does not make my classes balanced.
Do one of you have any suggestions to make my classes balanced when I have multiple classes?

Thank you in advance. 

Answers

  • MNNikiforos
    MNNikiforos New Altair Community Member
    Hello @Liza123,

    I have faced a similar issue when trying to balance data with more than 2 classes. I have tried 3 things that usually work, depending on the problem/data set.

    1. Define the minority class as the class with the fewest examples and collapse all the other the classes into 1 class, therefore making it a 2-class problem.
    2. Use the SMOTE upsampling operator with auto_detect_minority_class activated as many times as the number of classes, and each time use the new data set as input. At the end, synthetic examples will be created for each class except for the majority one.
    3. Use the Sample operator by setting balance_data parameter to true, and then define the sample size for each class. In this case, you can undersample your majority class.

    I usually use a combination of 2 and 3, by undersampling the majority class first and then applying SMOTE as needed.

    I hope that you will find something that works well for you!

    Best Regards