How to do Y-randomization in Rapidminer?

pengie
pengie New Altair Community Member
edited November 5 in Community Q&A
Hi,

I was wondering how do I do Y-randomization in Rapidminer? In Y-randomization, the y value of an example is randomly exchanged with the y value of another example. This is used in validation of QSAR models, whereby the performance of the original model (r2) is compared to that of models built for permuted (randomly shuffled) response.

Regards
Tagged:

Answers

  • land
    land New Altair Community Member
    Hi,
    although there is no operator for Y-Randomization in RapidMiner yet, we can make use of its modularity. I have created a process, doing Y-randomization. You could encapsulate it within an OperatorChain to use it within your process.
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="one third classification"/>
        </operator>
        <operator name="IdTagging" class="IdTagging">
        </operator>
        <operator name="IOMultiplier" class="IOMultiplier">
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
        <operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
            <parameter key="attribute_name_regex" value="label|id"/>
            <parameter key="condition_class" value="attribute_name_filter"/>
            <parameter key="keep_subset_only" value="true"/>
            <operator name="NoiseGenerator" class="NoiseGenerator">
                <parameter key="label_noise" value="0.0"/>
                <list key="noise">
                </list>
                <parameter key="random_attributes" value="1"/>
            </operator>
            <operator name="Sorting" class="Sorting">
                <parameter key="attribute_name" value="random"/>
            </operator>
            <operator name="IdTagging (2)" class="IdTagging">
            </operator>
        </operator>
        <operator name="IOSelector" class="IOSelector">
            <parameter key="io_object" value="ExampleSet"/>
            <parameter key="select_which" value="2"/>
        </operator>
        <operator name="ExampleSetJoin" class="ExampleSetJoin">
        </operator>
        <operator name="AttributeFilter (2)" class="AttributeFilter">
            <parameter key="condition_class" value="attribute_name_filter"/>
            <parameter key="invert_filter" value="true"/>
            <parameter key="parameter_string" value="random"/>
        </operator>
    </operator>
    Hope that helps.


    Greetings,
      Sebastian
  • pengie
    pengie New Altair Community Member
    Hi,

    thank you for your help. The code worked perfectly. I am now trying to use Rapidminer to do y-randomization, train a model, evaluate the model using leave-one-out and repeat this 100 times to get an average classification error for the y-randomization. I am using the following code

    <operator name="Root" class="Process" expanded="yes">
        <parameter key="random_seed" value="-1"/>
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="one third classification"/>
        </operator>
        <operator name="RepeatUntilOperatorChain" class="RepeatUntilOperatorChain" expanded="yes">
            <parameter key="max_iterations" value="100"/>
            <operator name="IdTagging" class="IdTagging">
            </operator>
            <operator name="IOMultiplier" class="IOMultiplier">
                <parameter key="io_object" value="ExampleSet"/>
            </operator>
            <operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="no">
                <parameter key="attribute_name_regex" value="label|id"/>
                <parameter key="condition_class" value="attribute_name_filter"/>
                <parameter key="keep_subset_only" value="true"/>
                <operator name="NoiseGenerator" class="NoiseGenerator">
                    <parameter key="label_noise" value="0.0"/>
                    <list key="noise">
                    </list>
                    <parameter key="random_attributes" value="1"/>
                </operator>
                <operator name="Sorting" class="Sorting">
                    <parameter key="attribute_name" value="random"/>
                </operator>
                <operator name="IdTagging (2)" class="IdTagging">
                </operator>
            </operator>
            <operator name="IOSelector" class="IOSelector">
                <parameter key="io_object" value="ExampleSet"/>
                <parameter key="select_which" value="2"/>
            </operator>
            <operator name="ExampleSetJoin" class="ExampleSetJoin">
            </operator>
            <operator name="AttributeFilter (2)" class="AttributeFilter">
                <parameter key="condition_class" value="attribute_name_filter"/>
                <parameter key="invert_filter" value="true"/>
                <parameter key="parameter_string" value="random"/>
            </operator>
            <operator name="XValidation" class="XValidation" expanded="yes">
                <parameter key="leave_one_out" value="true"/>
                <operator name="NearestNeighbors" class="NearestNeighbors">
                    <parameter key="k" value="3"/>
                </operator>
                <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                    <operator name="ModelApplier" class="ModelApplier">
                        <list key="application_parameters">
                        </list>
                    </operator>
                    <operator name="ClassificationPerformance" class="ClassificationPerformance">
                        <list key="class_weights">
                        </list>
                        <parameter key="classification_error" value="true"/>
                    </operator>
                </operator>
            </operator>
        </operator>
    </operator>
    However, it seems to give me an error about RepeatUntilOperatorChain.
  • TobiasMalbrecht
    TobiasMalbrecht New Altair Community Member
    Hi,

    just a hint: why do you not use the [tt]IteratingPerformanceAverage[/tt] operator which also iterates for a predifined number of times and also averages the performance vectors resulting from the inner operator chain?

    Regards,
    Tobias
  • pengie
    pengie New Altair Community Member
    Great hint!

    Met another error..."Message: The attribute 'random' does not exist.". Done a bit of tracing. It seems like the AttributeFilter (2) removes the attribute 'random' after the first round but on the second round, the NoiseGenerator generates attribute 'random1' instead of 'random', thus causing the error.

    <operator name="Root" class="Process" expanded="yes">
        <parameter key="random_seed" value="-1"/>
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="one third classification"/>
        </operator>
        <operator name="IteratingPerformanceAverage" class="IteratingPerformanceAverage" expanded="yes">
            <operator name="IdTagging" class="IdTagging">
            </operator>
            <operator name="IOMultiplier" class="IOMultiplier">
                <parameter key="io_object" value="ExampleSet"/>
            </operator>
            <operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
                <parameter key="attribute_name_regex" value="label|id"/>
                <parameter key="condition_class" value="attribute_name_filter"/>
                <parameter key="keep_subset_only" value="true"/>
                <operator name="NoiseGenerator" class="NoiseGenerator" breakpoints="after">
                    <parameter key="label_noise" value="0.0"/>
                    <list key="noise">
                    </list>
                    <parameter key="random_attributes" value="1"/>
                </operator>
                <operator name="Sorting" class="Sorting">
                    <parameter key="attribute_name" value="random"/>
                </operator>
                <operator name="IdTagging (2)" class="IdTagging">
                </operator>
            </operator>
            <operator name="IOSelector" class="IOSelector">
                <parameter key="io_object" value="ExampleSet"/>
                <parameter key="select_which" value="2"/>
            </operator>
            <operator name="ExampleSetJoin" class="ExampleSetJoin">
            </operator>
            <operator name="AttributeFilter (2)" class="AttributeFilter">
                <parameter key="condition_class" value="attribute_name_filter"/>
                <parameter key="invert_filter" value="true"/>
                <parameter key="parameter_string" value="random"/>
            </operator>
            <operator name="XValidation" class="XValidation" expanded="yes">
                <parameter key="leave_one_out" value="true"/>
                <operator name="NearestNeighbors" class="NearestNeighbors">
                    <parameter key="k" value="3"/>
                </operator>
                <operator name="OperatorChain" class="OperatorChain" expanded="no">
                    <operator name="ModelApplier" class="ModelApplier">
                        <list key="application_parameters">
                        </list>
                    </operator>
                    <operator name="ClassificationPerformance" class="ClassificationPerformance">
                        <list key="class_weights">
                        </list>
                        <parameter key="classification_error" value="true"/>
                    </operator>
                </operator>
            </operator>
        </operator>
    </operator>
  • land
    land New Altair Community Member
    Hi,
    try to use our Permutation Operator. I forgot it myself in the previous solution. So many Operators... :)
    <operator name="Root" class="Process" expanded="yes">
        <parameter key="random_seed" value="-1"/>
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="one third classification"/>
        </operator>
        <operator name="IteratingPerformanceAverage" class="IteratingPerformanceAverage" expanded="yes">
            <operator name="IdTagging" class="IdTagging">
            </operator>
            <operator name="IOMultiplier" class="IOMultiplier">
                <parameter key="io_object" value="ExampleSet"/>
            </operator>
            <operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
                <parameter key="attribute_name_regex" value="label|id"/>
                <parameter key="condition_class" value="attribute_name_filter"/>
                <parameter key="keep_subset_only" value="true"/>
                <operator name="Permutation" class="Permutation">
                </operator>
                <operator name="IdTagging (2)" class="IdTagging">
                </operator>
            </operator>
            <operator name="IOSelector" class="IOSelector">
                <parameter key="io_object" value="ExampleSet"/>
                <parameter key="select_which" value="2"/>
            </operator>
            <operator name="ExampleSetJoin" class="ExampleSetJoin">
            </operator>
            <operator name="XValidation" class="XValidation" expanded="yes">
                <parameter key="leave_one_out" value="true"/>
                <operator name="NearestNeighbors" class="NearestNeighbors">
                    <parameter key="k" value="3"/>
                </operator>
                <operator name="OperatorChain" class="OperatorChain" expanded="no">
                    <operator name="ModelApplier" class="ModelApplier">
                        <list key="application_parameters">
                        </list>
                    </operator>
                    <operator name="ClassificationPerformance" class="ClassificationPerformance">
                        <list key="class_weights">
                        </list>
                        <parameter key="classification_error" value="true"/>
                    </operator>
                </operator>
            </operator>
        </operator>
    </operator>

    This should help.

    Greetings,
      Sebastian
  • pengie
    pengie New Altair Community Member
    Thank you so much. It worked perfectly.  ;D

    Just one last question, when I do a breakpoint in ExampleSetJoin, I noticed that the id number of the dataset keeps increasing. Why is that so and will it have any impact on the memory?
  • land
    land New Altair Community Member
    Hi,
    no this won't increase the memory consumption. Memory of ExampleSets will be freed, if no ExampleSet exists adressing this memory. Keep in mind, that it have not be freed immediately. Java will free its memory when it thinks thats appropriate or needs it.

    Greetings,
      Sebastian