Splitting examples into training/test
kglowack
New Altair Community Member
Hello,
I'm new to RapidMiner but I have spent some time playing around with the software. Anyway, I haven't been able to find a way to split the input file into training and test sets using an attribute. So basically, in my dataset I have an attribute specifying which examples belong to the training set and which to the test set. How can I train a model only on the training examples and test it on the test set? The only solution I found was to use the BatchXValidation but I want to build a single model (I believe BatchXValidation would build 2 models, correct?).
Any help would be very much appreciated.
Thanks!
I'm new to RapidMiner but I have spent some time playing around with the software. Anyway, I haven't been able to find a way to split the input file into training and test sets using an attribute. So basically, in my dataset I have an attribute specifying which examples belong to the training set and which to the test set. How can I train a model only on the training examples and test it on the test set? The only solution I found was to use the BatchXValidation but I want to build a single model (I believe BatchXValidation would build 2 models, correct?).
Any help would be very much appreciated.
Thanks!
Tagged:
0
Answers
-
Hi,
you could use the ExampleFilter operator with the attribute_value_filter option, in order to select only examples of the first or of the second type.
Combined with an IOMultiplier, which doubles your input example set, it should be possible to do as you like.
Here's a (not very sensible, since splitting after the label does not make sense) example process, which demonstrates my suggestion:<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="sum classification"/>
</operator>
<operator name="IOMultiplier" class="IOMultiplier">
<parameter key="io_object" value="ExampleSet"/>
</operator>
<operator name="ExampleFilter" class="ExampleFilter" breakpoints="after">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="label = positive"/>
</operator>
<operator name="DecisionTree" class="DecisionTree">
</operator>
<operator name="ExampleFilter (2)" class="ExampleFilter" breakpoints="after">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="label = negative"/>
</operator>
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
<operator name="ClassificationPerformance" class="ClassificationPerformance">
<parameter key="accuracy" value="true"/>
<list key="class_weights">
</list>
</operator>
</operator>
Greetings,
Sebastian0 -
Awesome. This indeed should work.
Thanks!
Karolina.0