"Balanced sampling"

frankie
frankie New Altair Community Member
edited November 2024 in Community Q&A
A question. When I want to create a balanced training dataset using the "Sample" operator I get an error:
"Given index '4000' does not fit the mapped ExampleSet".

My data has two labels, "-1" and "1" and I try to take a 2000 sample from each of them.
The error comes during XValidation (not for example during preprocessing)

Can somebody help out, or guide me how these samples should/can be taken?
I realize it could be hard to pinpoint the source of the problem, but any ideas? The flow workds perfectly without the sampling operator.


-----------------------------------------------------

Exception: java.lang.RuntimeException
Message: Given index '4000' does not fit the mapped ExampleSet!
Stack trace:

  com.rapidminer.example.set.MappedExampleSet.getExample(MappedExampleSet.java:137)
  com.rapidminer.example.set.NonSpecialAttributesExampleSet.getExample(NonSpecialAttributesExampleSet.java:78)
  com.rapidminer.example.set.RemappedExampleSet.getExample(RemappedExampleSet.java:128)
  com.rapidminer.example.set.SplittedExampleSet.getExample(SplittedExampleSet.java:200)
  com.rapidminer.example.set.IndexBasedExampleSetReader.hasNext(IndexBasedExampleSetReader.java:62)
  com.rapidminer.example.set.AttributesExampleReader.hasNext(AttributesExampleReader.java:52)
  com.rapidminer.operator.learner.functions.kernel.evosvm.EvoSVMModel.performPrediction(EvoSVMModel.java:178)
  com.rapidminer.operator.learner.PredictionModel.apply(PredictionModel.java:76)
  com.rapidminer.operator.ModelApplier.doWork(ModelApplier.java:100)
  com.rapidminer.operator.Operator.execute(Operator.java:771)
  com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
  com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
  com.rapidminer.operator.validation.ValidationChain.executeEvaluator(ValidationChain.java:211)
  com.rapidminer.operator.validation.ValidationChain.evaluate(ValidationChain.java:307)
  com.rapidminer.operator.validation.XValidation.performIteration(XValidation.java:143)
  com.rapidminer.operator.validation.XValidation.estimatePerformance(XValidation.java:133)
  com.rapidminer.operator.validation.ValidationChain.doWork(ValidationChain.java:261)
  com.rapidminer.operator.Operator.execute(Operator.java:771)
  com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
  com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
  com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:368)
  com.rapidminer.operator.Operator.execute(Operator.java:771)
  com.rapidminer.Process.run(Process.java:899)
  com.rapidminer.Process.run(Process.java:795)
  com.rapidminer.Process.run(Process.java:790)
  com.rapidminer.Process.run(Process.java:780)
  com.rapidminer.gui.ProcessThread.run(ProcessThread.java:62)

Answers

  • Andrew2
    Andrew2 New Altair Community Member
    Hello Frankie,

    In the sample operator, set the sample parameter to relative and then set the sample ratio to 0.5. This will select half your examples.

    regards

    Andrew
  • frankie
    frankie New Altair Community Member
    What happens if the two groups contain 4000 and 3000 samples, respectively?
    How will I then sample 2000 from each?


    Thanks,
    Frankie
  • TK
    TK New Altair Community Member
    You can define absolute Values for each class within the sample operator
  • roman_bednarik
    roman_bednarik New Altair Community Member
    Hi, picking up on an old thread: how about if the size of the set is not known, e..g we don't know the absolute number of positive and the absolute number of negative examples? Is there a way to select a balanced subset?
  • MariusHelf
    MariusHelf New Altair Community Member
    Hi, you can filter your data by label and then apply sampling operators on the filtered data sets and append them. I think http://rapid-i.com/rapidforum/index.php/topic,5706.0.html gives an example for that.

    Best, Marius
  • abbasi_samira
    abbasi_samira New Altair Community Member

    Hello
    How can I equal the number of classes (50 50) for two feature?












    The class contains two values:

    true:94

    false:569











  • abbasi_samira
    abbasi_samira New Altair Community Member

     












    How can I balance the maximum amount of class attribute?











  • sgenzer
    sgenzer
    Altair Employee
    Hello @abbasi_samira - have you tried the “Balance Data” operator?

    Scott

  • abbasi_samira
    abbasi_samira New Altair Community Member

    hi












    I want to over sample balance data

    How can I oversampling balance this?
    Please explain the oversampling balance steps

    please help me

    thanks











  • abbasi_samira
    abbasi_samira New Altair Community Member

    hi












    Yes I used the Balance Sample

    read excel--->sample--->balance---->relative or absoulat

    but This is the method undersampling balance

    I need to oversampling balance











  • sgenzer
    sgenzer
    Altair Employee

    I would recommend going through the Sample operator tutorial (found inside the Sample help pane).

     

    Screen Shot 2017-12-16 at 12.14.56 PM.png

     

    The Mannheim extension also has a Balance data operator.


    Scott

     

  • kypexin
    kypexin New Altair Community Member

    Hi @abbasi_samira

     

    There's an extension called 'Operator toolbox' which now contains 'SMOTE upsampling' operator which you could use for oversampling the minority class.