"Balanced sampling"
A question. When I want to create a balanced training dataset using the "Sample" operator I get an error:
"Given index '4000' does not fit the mapped ExampleSet".
My data has two labels, "-1" and "1" and I try to take a 2000 sample from each of them.
The error comes during XValidation (not for example during preprocessing)
Can somebody help out, or guide me how these samples should/can be taken?
I realize it could be hard to pinpoint the source of the problem, but any ideas? The flow workds perfectly without the sampling operator.
-----------------------------------------------------
Exception: java.lang.RuntimeException
Message: Given index '4000' does not fit the mapped ExampleSet!
Stack trace:
com.rapidminer.example.set.MappedExampleSet.getExample(MappedExampleSet.java:137)
com.rapidminer.example.set.NonSpecialAttributesExampleSet.getExample(NonSpecialAttributesExampleSet.java:78)
com.rapidminer.example.set.RemappedExampleSet.getExample(RemappedExampleSet.java:128)
com.rapidminer.example.set.SplittedExampleSet.getExample(SplittedExampleSet.java:200)
com.rapidminer.example.set.IndexBasedExampleSetReader.hasNext(IndexBasedExampleSetReader.java:62)
com.rapidminer.example.set.AttributesExampleReader.hasNext(AttributesExampleReader.java:52)
com.rapidminer.operator.learner.functions.kernel.evosvm.EvoSVMModel.performPrediction(EvoSVMModel.java:178)
com.rapidminer.operator.learner.PredictionModel.apply(PredictionModel.java:76)
com.rapidminer.operator.ModelApplier.doWork(ModelApplier.java:100)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
com.rapidminer.operator.validation.ValidationChain.executeEvaluator(ValidationChain.java:211)
com.rapidminer.operator.validation.ValidationChain.evaluate(ValidationChain.java:307)
com.rapidminer.operator.validation.XValidation.performIteration(XValidation.java:143)
com.rapidminer.operator.validation.XValidation.estimatePerformance(XValidation.java:133)
com.rapidminer.operator.validation.ValidationChain.doWork(ValidationChain.java:261)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:368)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.Process.run(Process.java:899)
com.rapidminer.Process.run(Process.java:795)
com.rapidminer.Process.run(Process.java:790)
com.rapidminer.Process.run(Process.java:780)
com.rapidminer.gui.ProcessThread.run(ProcessThread.java:62)
"Given index '4000' does not fit the mapped ExampleSet".
My data has two labels, "-1" and "1" and I try to take a 2000 sample from each of them.
The error comes during XValidation (not for example during preprocessing)
Can somebody help out, or guide me how these samples should/can be taken?
I realize it could be hard to pinpoint the source of the problem, but any ideas? The flow workds perfectly without the sampling operator.
-----------------------------------------------------
Exception: java.lang.RuntimeException
Message: Given index '4000' does not fit the mapped ExampleSet!
Stack trace:
com.rapidminer.example.set.MappedExampleSet.getExample(MappedExampleSet.java:137)
com.rapidminer.example.set.NonSpecialAttributesExampleSet.getExample(NonSpecialAttributesExampleSet.java:78)
com.rapidminer.example.set.RemappedExampleSet.getExample(RemappedExampleSet.java:128)
com.rapidminer.example.set.SplittedExampleSet.getExample(SplittedExampleSet.java:200)
com.rapidminer.example.set.IndexBasedExampleSetReader.hasNext(IndexBasedExampleSetReader.java:62)
com.rapidminer.example.set.AttributesExampleReader.hasNext(AttributesExampleReader.java:52)
com.rapidminer.operator.learner.functions.kernel.evosvm.EvoSVMModel.performPrediction(EvoSVMModel.java:178)
com.rapidminer.operator.learner.PredictionModel.apply(PredictionModel.java:76)
com.rapidminer.operator.ModelApplier.doWork(ModelApplier.java:100)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
com.rapidminer.operator.validation.ValidationChain.executeEvaluator(ValidationChain.java:211)
com.rapidminer.operator.validation.ValidationChain.evaluate(ValidationChain.java:307)
com.rapidminer.operator.validation.XValidation.performIteration(XValidation.java:143)
com.rapidminer.operator.validation.XValidation.estimatePerformance(XValidation.java:133)
com.rapidminer.operator.validation.ValidationChain.doWork(ValidationChain.java:261)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:368)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.Process.run(Process.java:899)
com.rapidminer.Process.run(Process.java:795)
com.rapidminer.Process.run(Process.java:790)
com.rapidminer.Process.run(Process.java:780)
com.rapidminer.gui.ProcessThread.run(ProcessThread.java:62)
Find more posts tagged with
Sort by:
1 - 12 of
121
Hi, you can filter your data by label and then apply sampling operators on the filtered data sets and append them. I think http://rapid-i.com/rapidforum/index.php/topic,5706.0.html gives an example for that.
Best, Marius
Best, Marius
There's an extension called 'Operator toolbox' which now contains 'SMOTE upsampling' operator which you could use for oversampling the minority class.
In the sample operator, set the sample parameter to relative and then set the sample ratio to 0.5. This will select half your examples.
regards
Andrew