Downsampling operators

20160041
20160041 New Altair Community Member
edited November 2024 in Community Q&A
Hi,
Could you please tell me how I can achieve downsampling with imbalanced data in RM. I have used the random sampling and sampling bootstrap operators would also like to know the difference between the two.
Thank you
Tagged:

Best Answers

  • rfuentealba
    rfuentealba New Altair Community Member
    Answer ✓
    Hi,

    In the Mannheim Toolbox extension, there is a Sample - Balance operator that does just this.

    (Opinions and fundamental techniques aside, but you might want to work with weighting instead of sampling.)

    All the best,

    Rodrigo.
  • Telcontar120
    Telcontar120 New Altair Community Member
    Answer ✓
    I second the idea that weighting is my preferred approach, and that downsampling should be used primarily when you have many more cases than needed (either in general, or specifically of the majority class).  There are diminishing returns to larger and larger samples, so if your development population is hundreds of thousands of cases then you likely don't need them all.  But if you have an absolutely small number of your minority class then you probably don't want to downsample the majority class to match it as too much information would be lost.

Answers

  • rfuentealba
    rfuentealba New Altair Community Member
    Answer ✓
    Hi,

    In the Mannheim Toolbox extension, there is a Sample - Balance operator that does just this.

    (Opinions and fundamental techniques aside, but you might want to work with weighting instead of sampling.)

    All the best,

    Rodrigo.
  • Telcontar120
    Telcontar120 New Altair Community Member
    Answer ✓
    I second the idea that weighting is my preferred approach, and that downsampling should be used primarily when you have many more cases than needed (either in general, or specifically of the majority class).  There are diminishing returns to larger and larger samples, so if your development population is hundreds of thousands of cases then you likely don't need them all.  But if you have an absolutely small number of your minority class then you probably don't want to downsample the majority class to match it as too much information would be lost.