"Balanced sampling"
frankie
New Altair Community Member
A question. When I want to create a balanced training dataset using the "Sample" operator I get an error:
"Given index '4000' does not fit the mapped ExampleSet".
My data has two labels, "-1" and "1" and I try to take a 2000 sample from each of them.
The error comes during XValidation (not for example during preprocessing)
Can somebody help out, or guide me how these samples should/can be taken?
I realize it could be hard to pinpoint the source of the problem, but any ideas? The flow workds perfectly without the sampling operator.
-----------------------------------------------------
Exception: java.lang.RuntimeException
Message: Given index '4000' does not fit the mapped ExampleSet!
Stack trace:
com.rapidminer.example.set.MappedExampleSet.getExample(MappedExampleSet.java:137)
com.rapidminer.example.set.NonSpecialAttributesExampleSet.getExample(NonSpecialAttributesExampleSet.java:78)
com.rapidminer.example.set.RemappedExampleSet.getExample(RemappedExampleSet.java:128)
com.rapidminer.example.set.SplittedExampleSet.getExample(SplittedExampleSet.java:200)
com.rapidminer.example.set.IndexBasedExampleSetReader.hasNext(IndexBasedExampleSetReader.java:62)
com.rapidminer.example.set.AttributesExampleReader.hasNext(AttributesExampleReader.java:52)
com.rapidminer.operator.learner.functions.kernel.evosvm.EvoSVMModel.performPrediction(EvoSVMModel.java:178)
com.rapidminer.operator.learner.PredictionModel.apply(PredictionModel.java:76)
com.rapidminer.operator.ModelApplier.doWork(ModelApplier.java:100)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
com.rapidminer.operator.validation.ValidationChain.executeEvaluator(ValidationChain.java:211)
com.rapidminer.operator.validation.ValidationChain.evaluate(ValidationChain.java:307)
com.rapidminer.operator.validation.XValidation.performIteration(XValidation.java:143)
com.rapidminer.operator.validation.XValidation.estimatePerformance(XValidation.java:133)
com.rapidminer.operator.validation.ValidationChain.doWork(ValidationChain.java:261)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:368)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.Process.run(Process.java:899)
com.rapidminer.Process.run(Process.java:795)
com.rapidminer.Process.run(Process.java:790)
com.rapidminer.Process.run(Process.java:780)
com.rapidminer.gui.ProcessThread.run(ProcessThread.java:62)
"Given index '4000' does not fit the mapped ExampleSet".
My data has two labels, "-1" and "1" and I try to take a 2000 sample from each of them.
The error comes during XValidation (not for example during preprocessing)
Can somebody help out, or guide me how these samples should/can be taken?
I realize it could be hard to pinpoint the source of the problem, but any ideas? The flow workds perfectly without the sampling operator.
-----------------------------------------------------
Exception: java.lang.RuntimeException
Message: Given index '4000' does not fit the mapped ExampleSet!
Stack trace:
com.rapidminer.example.set.MappedExampleSet.getExample(MappedExampleSet.java:137)
com.rapidminer.example.set.NonSpecialAttributesExampleSet.getExample(NonSpecialAttributesExampleSet.java:78)
com.rapidminer.example.set.RemappedExampleSet.getExample(RemappedExampleSet.java:128)
com.rapidminer.example.set.SplittedExampleSet.getExample(SplittedExampleSet.java:200)
com.rapidminer.example.set.IndexBasedExampleSetReader.hasNext(IndexBasedExampleSetReader.java:62)
com.rapidminer.example.set.AttributesExampleReader.hasNext(AttributesExampleReader.java:52)
com.rapidminer.operator.learner.functions.kernel.evosvm.EvoSVMModel.performPrediction(EvoSVMModel.java:178)
com.rapidminer.operator.learner.PredictionModel.apply(PredictionModel.java:76)
com.rapidminer.operator.ModelApplier.doWork(ModelApplier.java:100)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
com.rapidminer.operator.validation.ValidationChain.executeEvaluator(ValidationChain.java:211)
com.rapidminer.operator.validation.ValidationChain.evaluate(ValidationChain.java:307)
com.rapidminer.operator.validation.XValidation.performIteration(XValidation.java:143)
com.rapidminer.operator.validation.XValidation.estimatePerformance(XValidation.java:133)
com.rapidminer.operator.validation.ValidationChain.doWork(ValidationChain.java:261)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:368)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.Process.run(Process.java:899)
com.rapidminer.Process.run(Process.java:795)
com.rapidminer.Process.run(Process.java:790)
com.rapidminer.Process.run(Process.java:780)
com.rapidminer.gui.ProcessThread.run(ProcessThread.java:62)
0
Answers
-
Hello Frankie,
In the sample operator, set the sample parameter to relative and then set the sample ratio to 0.5. This will select half your examples.
regards
Andrew0 -
What happens if the two groups contain 4000 and 3000 samples, respectively?
How will I then sample 2000 from each?
Thanks,
Frankie0 -
You can define absolute Values for each class within the sample operator0
-
Hi, picking up on an old thread: how about if the size of the set is not known, e..g we don't know the absolute number of positive and the absolute number of negative examples? Is there a way to select a balanced subset?0
-
Hi, you can filter your data by label and then apply sampling operators on the filtered data sets and append them. I think http://rapid-i.com/rapidforum/index.php/topic,5706.0.html gives an example for that.
Best, Marius0 -
Hello
How can I equal the number of classes (50 50) for two feature?The class contains two values:true:94false:5690 -
How can I balance the maximum amount of class attribute?0
-
0
-
hi
I want to over sample balance dataHow can I oversampling balance this?
Please explain the oversampling balance stepsplease help methanks0 -
hi
Yes I used the Balance Sampleread excel--->sample--->balance---->relative or absoulatbut This is the method undersampling balanceI need to oversampling balance0 -
I would recommend going through the Sample operator tutorial (found inside the Sample help pane).
The Mannheim extension also has a Balance data operator.
Scott1 -
There's an extension called 'Operator toolbox' which now contains 'SMOTE upsampling' operator which you could use for oversampling the minority class.
3