How to do Y-randomization in Rapidminer?
pengie
New Altair Community Member
Hi,
I was wondering how do I do Y-randomization in Rapidminer? In Y-randomization, the y value of an example is randomly exchanged with the y value of another example. This is used in validation of QSAR models, whereby the performance of the original model (r2) is compared to that of models built for permuted (randomly shuffled) response.
Regards
I was wondering how do I do Y-randomization in Rapidminer? In Y-randomization, the y value of an example is randomly exchanged with the y value of another example. This is used in validation of QSAR models, whereby the performance of the original model (r2) is compared to that of models built for permuted (randomly shuffled) response.
Regards
Tagged:
0
Answers
-
Hi,
although there is no operator for Y-Randomization in RapidMiner yet, we can make use of its modularity. I have created a process, doing Y-randomization. You could encapsulate it within an OperatorChain to use it within your process.<operator name="Root" class="Process" expanded="yes">
Hope that helps.
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="one third classification"/>
</operator>
<operator name="IdTagging" class="IdTagging">
</operator>
<operator name="IOMultiplier" class="IOMultiplier">
<parameter key="io_object" value="ExampleSet"/>
</operator>
<operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
<parameter key="attribute_name_regex" value="label|id"/>
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="keep_subset_only" value="true"/>
<operator name="NoiseGenerator" class="NoiseGenerator">
<parameter key="label_noise" value="0.0"/>
<list key="noise">
</list>
<parameter key="random_attributes" value="1"/>
</operator>
<operator name="Sorting" class="Sorting">
<parameter key="attribute_name" value="random"/>
</operator>
<operator name="IdTagging (2)" class="IdTagging">
</operator>
</operator>
<operator name="IOSelector" class="IOSelector">
<parameter key="io_object" value="ExampleSet"/>
<parameter key="select_which" value="2"/>
</operator>
<operator name="ExampleSetJoin" class="ExampleSetJoin">
</operator>
<operator name="AttributeFilter (2)" class="AttributeFilter">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="invert_filter" value="true"/>
<parameter key="parameter_string" value="random"/>
</operator>
</operator>
Greetings,
Sebastian0 -
Hi,
thank you for your help. The code worked perfectly. I am now trying to use Rapidminer to do y-randomization, train a model, evaluate the model using leave-one-out and repeat this 100 times to get an average classification error for the y-randomization. I am using the following code
However, it seems to give me an error about RepeatUntilOperatorChain.
<operator name="Root" class="Process" expanded="yes">
<parameter key="random_seed" value="-1"/>
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="one third classification"/>
</operator>
<operator name="RepeatUntilOperatorChain" class="RepeatUntilOperatorChain" expanded="yes">
<parameter key="max_iterations" value="100"/>
<operator name="IdTagging" class="IdTagging">
</operator>
<operator name="IOMultiplier" class="IOMultiplier">
<parameter key="io_object" value="ExampleSet"/>
</operator>
<operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="no">
<parameter key="attribute_name_regex" value="label|id"/>
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="keep_subset_only" value="true"/>
<operator name="NoiseGenerator" class="NoiseGenerator">
<parameter key="label_noise" value="0.0"/>
<list key="noise">
</list>
<parameter key="random_attributes" value="1"/>
</operator>
<operator name="Sorting" class="Sorting">
<parameter key="attribute_name" value="random"/>
</operator>
<operator name="IdTagging (2)" class="IdTagging">
</operator>
</operator>
<operator name="IOSelector" class="IOSelector">
<parameter key="io_object" value="ExampleSet"/>
<parameter key="select_which" value="2"/>
</operator>
<operator name="ExampleSetJoin" class="ExampleSetJoin">
</operator>
<operator name="AttributeFilter (2)" class="AttributeFilter">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="invert_filter" value="true"/>
<parameter key="parameter_string" value="random"/>
</operator>
<operator name="XValidation" class="XValidation" expanded="yes">
<parameter key="leave_one_out" value="true"/>
<operator name="NearestNeighbors" class="NearestNeighbors">
<parameter key="k" value="3"/>
</operator>
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
<operator name="ClassificationPerformance" class="ClassificationPerformance">
<list key="class_weights">
</list>
<parameter key="classification_error" value="true"/>
</operator>
</operator>
</operator>
</operator>
</operator>0 -
Hi,
just a hint: why do you not use the [tt]IteratingPerformanceAverage[/tt] operator which also iterates for a predifined number of times and also averages the performance vectors resulting from the inner operator chain?
Regards,
Tobias0 -
Great hint!
Met another error..."Message: The attribute 'random' does not exist.". Done a bit of tracing. It seems like the AttributeFilter (2) removes the attribute 'random' after the first round but on the second round, the NoiseGenerator generates attribute 'random1' instead of 'random', thus causing the error.
<operator name="Root" class="Process" expanded="yes">
<parameter key="random_seed" value="-1"/>
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="one third classification"/>
</operator>
<operator name="IteratingPerformanceAverage" class="IteratingPerformanceAverage" expanded="yes">
<operator name="IdTagging" class="IdTagging">
</operator>
<operator name="IOMultiplier" class="IOMultiplier">
<parameter key="io_object" value="ExampleSet"/>
</operator>
<operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
<parameter key="attribute_name_regex" value="label|id"/>
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="keep_subset_only" value="true"/>
<operator name="NoiseGenerator" class="NoiseGenerator" breakpoints="after">
<parameter key="label_noise" value="0.0"/>
<list key="noise">
</list>
<parameter key="random_attributes" value="1"/>
</operator>
<operator name="Sorting" class="Sorting">
<parameter key="attribute_name" value="random"/>
</operator>
<operator name="IdTagging (2)" class="IdTagging">
</operator>
</operator>
<operator name="IOSelector" class="IOSelector">
<parameter key="io_object" value="ExampleSet"/>
<parameter key="select_which" value="2"/>
</operator>
<operator name="ExampleSetJoin" class="ExampleSetJoin">
</operator>
<operator name="AttributeFilter (2)" class="AttributeFilter">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="invert_filter" value="true"/>
<parameter key="parameter_string" value="random"/>
</operator>
<operator name="XValidation" class="XValidation" expanded="yes">
<parameter key="leave_one_out" value="true"/>
<operator name="NearestNeighbors" class="NearestNeighbors">
<parameter key="k" value="3"/>
</operator>
<operator name="OperatorChain" class="OperatorChain" expanded="no">
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
<operator name="ClassificationPerformance" class="ClassificationPerformance">
<list key="class_weights">
</list>
<parameter key="classification_error" value="true"/>
</operator>
</operator>
</operator>
</operator>
</operator>0 -
Hi,
try to use our Permutation Operator. I forgot it myself in the previous solution. So many Operators...<operator name="Root" class="Process" expanded="yes">
<parameter key="random_seed" value="-1"/>
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="one third classification"/>
</operator>
<operator name="IteratingPerformanceAverage" class="IteratingPerformanceAverage" expanded="yes">
<operator name="IdTagging" class="IdTagging">
</operator>
<operator name="IOMultiplier" class="IOMultiplier">
<parameter key="io_object" value="ExampleSet"/>
</operator>
<operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
<parameter key="attribute_name_regex" value="label|id"/>
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="keep_subset_only" value="true"/>
<operator name="Permutation" class="Permutation">
</operator>
<operator name="IdTagging (2)" class="IdTagging">
</operator>
</operator>
<operator name="IOSelector" class="IOSelector">
<parameter key="io_object" value="ExampleSet"/>
<parameter key="select_which" value="2"/>
</operator>
<operator name="ExampleSetJoin" class="ExampleSetJoin">
</operator>
<operator name="XValidation" class="XValidation" expanded="yes">
<parameter key="leave_one_out" value="true"/>
<operator name="NearestNeighbors" class="NearestNeighbors">
<parameter key="k" value="3"/>
</operator>
<operator name="OperatorChain" class="OperatorChain" expanded="no">
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
<operator name="ClassificationPerformance" class="ClassificationPerformance">
<list key="class_weights">
</list>
<parameter key="classification_error" value="true"/>
</operator>
</operator>
</operator>
</operator>
</operator>
This should help.
Greetings,
Sebastian0 -
Thank you so much. It worked perfectly. ;D
Just one last question, when I do a breakpoint in ExampleSetJoin, I noticed that the id number of the dataset keeps increasing. Why is that so and will it have any impact on the memory?0 -
Hi,
no this won't increase the memory consumption. Memory of ExampleSets will be freed, if no ExampleSet exists adressing this memory. Keep in mind, that it have not be freed immediately. Java will free its memory when it thinks thats appropriate or needs it.
Greetings,
Sebastian0