🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"Balanced sampling"

User: "frankie"
New Altair Community Member
Updated by Jocelyn
A question. When I want to create a balanced training dataset using the "Sample" operator I get an error:
"Given index '4000' does not fit the mapped ExampleSet".

My data has two labels, "-1" and "1" and I try to take a 2000 sample from each of them.
The error comes during XValidation (not for example during preprocessing)

Can somebody help out, or guide me how these samples should/can be taken?
I realize it could be hard to pinpoint the source of the problem, but any ideas? The flow workds perfectly without the sampling operator.


-----------------------------------------------------

Exception: java.lang.RuntimeException
Message: Given index '4000' does not fit the mapped ExampleSet!
Stack trace:

  com.rapidminer.example.set.MappedExampleSet.getExample(MappedExampleSet.java:137)
  com.rapidminer.example.set.NonSpecialAttributesExampleSet.getExample(NonSpecialAttributesExampleSet.java:78)
  com.rapidminer.example.set.RemappedExampleSet.getExample(RemappedExampleSet.java:128)
  com.rapidminer.example.set.SplittedExampleSet.getExample(SplittedExampleSet.java:200)
  com.rapidminer.example.set.IndexBasedExampleSetReader.hasNext(IndexBasedExampleSetReader.java:62)
  com.rapidminer.example.set.AttributesExampleReader.hasNext(AttributesExampleReader.java:52)
  com.rapidminer.operator.learner.functions.kernel.evosvm.EvoSVMModel.performPrediction(EvoSVMModel.java:178)
  com.rapidminer.operator.learner.PredictionModel.apply(PredictionModel.java:76)
  com.rapidminer.operator.ModelApplier.doWork(ModelApplier.java:100)
  com.rapidminer.operator.Operator.execute(Operator.java:771)
  com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
  com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
  com.rapidminer.operator.validation.ValidationChain.executeEvaluator(ValidationChain.java:211)
  com.rapidminer.operator.validation.ValidationChain.evaluate(ValidationChain.java:307)
  com.rapidminer.operator.validation.XValidation.performIteration(XValidation.java:143)
  com.rapidminer.operator.validation.XValidation.estimatePerformance(XValidation.java:133)
  com.rapidminer.operator.validation.ValidationChain.doWork(ValidationChain.java:261)
  com.rapidminer.operator.Operator.execute(Operator.java:771)
  com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
  com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
  com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:368)
  com.rapidminer.operator.Operator.execute(Operator.java:771)
  com.rapidminer.Process.run(Process.java:899)
  com.rapidminer.Process.run(Process.java:795)
  com.rapidminer.Process.run(Process.java:790)
  com.rapidminer.Process.run(Process.java:780)
  com.rapidminer.gui.ProcessThread.run(ProcessThread.java:62)

Find more posts tagged with

Sort by:
1 - 12 of 121
    User: "Andrew2"
    New Altair Community Member
    Hello Frankie,

    In the sample operator, set the sample parameter to relative and then set the sample ratio to 0.5. This will select half your examples.

    regards

    Andrew
    User: "frankie"
    New Altair Community Member
    OP
    What happens if the two groups contain 4000 and 3000 samples, respectively?
    How will I then sample 2000 from each?


    Thanks,
    Frankie
    User: "TK"
    New Altair Community Member
    You can define absolute Values for each class within the sample operator
    User: "roman_bednarik"
    New Altair Community Member
    Hi, picking up on an old thread: how about if the size of the set is not known, e..g we don't know the absolute number of positive and the absolute number of negative examples? Is there a way to select a balanced subset?
    User: "MariusHelf"
    New Altair Community Member
    Hi, you can filter your data by label and then apply sampling operators on the filtered data sets and append them. I think http://rapid-i.com/rapidforum/index.php/topic,5706.0.html gives an example for that.

    Best, Marius
    User: "abbasi_samira"
    New Altair Community Member

    Hello
    How can I equal the number of classes (50 50) for two feature?












    The class contains two values:

    true:94

    false:569











    User: "abbasi_samira"
    New Altair Community Member

     












    How can I balance the maximum amount of class attribute?











    User: "sgenzer"
    Altair Employee
    Hello @abbasi_samira - have you tried the “Balance Data” operator?

    Scott

    User: "abbasi_samira"
    New Altair Community Member

    hi












    I want to over sample balance data

    How can I oversampling balance this?
    Please explain the oversampling balance steps

    please help me

    thanks











    User: "abbasi_samira"
    New Altair Community Member

    hi












    Yes I used the Balance Sample

    read excel--->sample--->balance---->relative or absoulat

    but This is the method undersampling balance

    I need to oversampling balance











    User: "sgenzer"
    Altair Employee

    I would recommend going through the Sample operator tutorial (found inside the Sample help pane).

     

    Screen Shot 2017-12-16 at 12.14.56 PM.png

     

    The Mannheim extension also has a Balance data operator.


    Scott

     

    User: "kypexin"
    New Altair Community Member

    Hi @abbasi_samira

     

    There's an extension called 'Operator toolbox' which now contains 'SMOTE upsampling' operator which you could use for oversampling the minority class.