"correlated feature and creating a itteration in models"

Legacy User
Legacy User New Altair Community Member
edited November 5 in Community Q&A
Hallo,

I have create a modell in rapidminer version 4.6.
The aims of this modells are: Modelling of the chlorophyll dispersion

depend variable:    chlorophyll data - numerical data
independ variables: different variables - numerical data


The problem of all my data-sets is, that I have a lot of independend variables (numerical), but the
variables are among themselves correlate.

Therefore at first I have integrate
a) SVM-Weighting            - Weighting of all variables
b) AttributeWeightSelection - extract of the important variables

c) RemoveCorrelatedFeature  - to remove correlatedFeature - but: Attribute-order: random !!!
  The Attribute-order: random    is important because I will get different sets of important variables.
 

What is the my problem:

I would like to create a iteration over the modelle (because I get different sets of variables from the random
selection). And I will get the best modell from this iteration as an result.
But I did not have any idea, how does it work.

In the following I send you the model. For testing this I integrated the examplesetGenerator. This are not the
original data, but its shows what happen in the modell.

Can anybody help me in the integration of an iteration-part?
Many thanks for this.

Best regard

Angela



<operator name="Root" class="Process" expanded="yes">
    <description text="Datensatz:  doy_sz1_chlalt_ausreisser_aisa_mean1.amlE:\v_rapid_min\v_rapid_eschen1\v_chl_aisa_modell\"/>
    <operator name="Daten laden  und vorbereiten" class="OperatorChain" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="sum"/>
            <parameter key="number_of_attributes" value="100"/>
        </operator>
    </operator>
    <operator name="Attribute identifizieren, Ranking, Correalation" class="OperatorChain" expanded="yes">
        <operator name="SVMWeighting" class="SVMWeighting">
        </operator>
        <operator name="AttributeWeightSelection" class="AttributeWeightSelection">
            <parameter key="keep_attribute_weights" value="true"/>
            <parameter key="weight" value="0.0"/>
        </operator>
        <operator name="CorrelationMatrix - 1" class="CorrelationMatrix">
        </operator>
        <operator name="RemoveCorrelatedFeatures 1" class="RemoveCorrelatedFeatures">
            <parameter key="correlation" value="0.1"/>
            <parameter key="attribute_order" value="random"/>
        </operator>
        <operator name="RemoveCorrelatedFeatures 2" class="RemoveCorrelatedFeatures">
            <parameter key="correlation" value="-0.1"/>
            <parameter key="attribute_order" value="random"/>
        </operator>
        <operator name="CorrelationMatrix - 2" class="CorrelationMatrix">
        </operator>
    </operator>
    <operator name="Modelle lernen" class="OperatorChain" expanded="yes">
        <operator name="LinearRegression" class="LinearRegression">
            <parameter key="keep_example_set" value="true"/>
        </operator>
        <operator name="NeuralNetImproved (2)" class="NeuralNetImproved" activated="no">
            <parameter key="keep_example_set" value="true"/>
            <list key="hidden_layers">
              <parameter key="Layer1" value="3"/>
            </list>
        </operator>
    </operator>
    <operator name="Modell anwenden und evaluieren" class="OperatorChain" expanded="no">
        <operator name="XValidation" class="XValidation" expanded="yes">
            <parameter key="keep_example_set" value="true"/>
            <parameter key="sampling_type" value="shuffled sampling"/>
            <operator name="LinearRegression (2)" class="LinearRegression">
            </operator>
            <operator name="NeuralNetImproved" class="NeuralNetImproved" activated="no">
                <list key="hidden_layers">
                </list>
            </operator>
            <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                <operator name="ModelApplier (2)" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="RegressionPerformance" class="RegressionPerformance">
                    <parameter key="root_mean_squared_error" value="true"/>
                    <parameter key="absolute_error" value="true"/>
                    <parameter key="relative_error" value="true"/>
                    <parameter key="skip_undefined_labels" value="false"/>
                    <parameter key="use_example_weights" value="false"/>
                </operator>
            </operator>
        </operator>
    </operator>
    <operator name="ModelApplier" class="ModelApplier">
        <parameter key="keep_model" value="true"/>
        <list key="application_parameters">
        </list>
    </operator>
</operator>

Answers

  • land
    land New Altair Community Member
    Hi Angela,
    RapidMiner provides an operator, which will solve your problem. It's called the RandomOptimizer and applies to all situations, where random effects have heavy impact on the performance. You will simply put everything inside this meta operator, which must be repeatedly executed.
    Before I post the example process, a remark on your process:
    I don't think removing correlated features twice will make any difference, because you checked the "use_absolute_correlation" parameter. This will remove each attribute correlated with more than 0.1 and less than -0.1 in the first round. You might use breakpoints after each operator to see the results.

    And I have removed the additional learning of the linear regression model after the XValidation. If you want a complete model, using all data for learning, you simply might check the "create_complete_model" parameter in the XValidation. But applying this model on the training data won't give you any reliable results, because it might be overfitted on the training set.

    And here's your slightly modified process:
    <operator name="Root" class="Process" expanded="yes">
        <description text="Datensatz:  doy_sz1_chlalt_ausreisser_aisa_mean1.amlE:\v_rapid_min\v_rapid_eschen1\v_chl_aisa_modell\"/>
        <operator name="Daten laden  und vorbereiten" class="OperatorChain" expanded="yes">
            <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
                <parameter key="target_function" value="sum"/>
                <parameter key="number_of_attributes" value="100"/>
            </operator>
        </operator>
        <operator name="Attribute identifizieren, Ranking, Correalation" class="OperatorChain" expanded="yes">
            <operator name="SVMWeighting" class="SVMWeighting">
            </operator>
            <operator name="AttributeWeightSelection" class="AttributeWeightSelection">
                <parameter key="keep_attribute_weights" value="true"/>
                <parameter key="weight" value="0.0"/>
            </operator>
            <operator name="CorrelationMatrix - 1" class="CorrelationMatrix">
            </operator>
            <operator name="RemoveCorrelatedFeatures 1" class="RemoveCorrelatedFeatures">
                <parameter key="correlation" value="0.1"/>
                <parameter key="attribute_order" value="random"/>
            </operator>
            <operator name="RemoveCorrelatedFeatures 2" class="RemoveCorrelatedFeatures">
                <parameter key="correlation" value="-0.1"/>
                <parameter key="attribute_order" value="random"/>
            </operator>
            <operator name="CorrelationMatrix - 2" class="CorrelationMatrix">
            </operator>
        </operator>
        <operator name="Mehrmals ausführen, bestes behalten" class="RandomOptimizer" expanded="yes">
            <parameter key="iterations" value="100"/>
            <operator name="Kreuzvalidierung zur Performanzschätzung" class="XValidation" expanded="yes">
                <parameter key="sampling_type" value="shuffled sampling"/>
                <operator name="LinearRegression (2)" class="LinearRegression">
                </operator>
                <operator name="NeuralNetImproved" class="NeuralNetImproved" activated="no">
                    <list key="hidden_layers">
                    </list>
                </operator>
                <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                    <operator name="ModelApplier (2)" class="ModelApplier">
                        <list key="application_parameters">
                        </list>
                    </operator>
                    <operator name="RegressionPerformance" class="RegressionPerformance">
                        <parameter key="root_mean_squared_error" value="true"/>
                        <parameter key="absolute_error" value="true"/>
                        <parameter key="relative_error" value="true"/>
                        <parameter key="skip_undefined_labels" value="false"/>
                        <parameter key="use_example_weights" value="false"/>
                    </operator>
                </operator>
            </operator>
        </operator>
    </operator>
    Greetings,
      Sebastian