different values for regressionPerformance for the same data
Legacy User
New Altair Community Member
Hallo,
I have the problem, that I get different values for regressionPerformance for the attribute.
I have used the model1 (with featureselection) and model 2 (without featureselection - but only with attributefilter
Attribut
Model1: att3 root_mean_sqared_error 0.334 squared_correlaton 10.651
Model2: att3 root_mean_sqared_error 0.326 squared_correlaton 11.189
???
The same attribute (e.g. att3) has different value for regressionPerformance in both models. Can anyone tell me why?
Model 1
<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSetGenerator" class="ExampleSetGenerator" breakpoints="after">
<parameter key="target_function" value="sum"/>
</operator>
<operator name="FS" class="FeatureSelection" expanded="yes">
<parameter key="user_result_individual_selection" value="true"/>
<parameter key="keep_best" value="64"/>
<parameter key="maximum_number_of_generations" value="1"/>
<operator name="BootstrappingValidation" class="BootstrappingValidation" expanded="yes">
<parameter key="keep_example_set" value="true"/>
<parameter key="create_complete_model" value="true"/>
<operator name="LinearRegression" class="LinearRegression">
<parameter key="feature_selection" value="none"/>
</operator>
<operator name="ApplierChain" class="OperatorChain" expanded="yes">
<operator name="Applier" class="ModelApplier">
<parameter key="keep_model" value="true"/>
<list key="application_parameters">
</list>
</operator>
<operator name="RegressionPerformance" class="RegressionPerformance">
<parameter key="main_criterion" value="squared_correlation"/>
<parameter key="root_mean_squared_error" value="true"/>
<parameter key="squared_correlation" value="true"/>
</operator>
</operator>
</operator>
</operator>
</operator>
Model 2
<operator name="Root" class="Process" expanded="yes">
<operator name="Daten laden und vorbereiten" class="OperatorChain" expanded="yes">
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="sum"/>
</operator>
</operator>
<operator name="Attribute identifizieren, Ranking, Correalation" class="OperatorChain" expanded="yes">
<operator name="AttributeFilter" class="AttributeFilter">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="parameter_string" value="att3"/>
</operator>
<operator name="BootstrappingValidation" class="BootstrappingValidation" expanded="yes">
<parameter key="keep_example_set" value="true"/>
<parameter key="create_complete_model" value="true"/>
<operator name="LinearRegression" class="LinearRegression">
<parameter key="feature_selection" value="none"/>
</operator>
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="ModelApplier (2)" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
<operator name="RegressionPerformance" class="RegressionPerformance">
<parameter key="root_mean_squared_error" value="true"/>
<parameter key="absolute_error" value="true"/>
<parameter key="relative_error" value="true"/>
<parameter key="correlation" value="true"/>
<parameter key="squared_correlation" value="true"/>
<parameter key="skip_undefined_labels" value="false"/>
<parameter key="use_example_weights" value="false"/>
</operator>
</operator>
</operator>
</operator>
<operator name="ModelApplier" class="ModelApplier">
<parameter key="keep_model" value="true"/>
<list key="application_parameters">
</list>
</operator>
</operator>
I have the problem, that I get different values for regressionPerformance for the attribute.
I have used the model1 (with featureselection) and model 2 (without featureselection - but only with attributefilter
Attribut
Model1: att3 root_mean_sqared_error 0.334 squared_correlaton 10.651
Model2: att3 root_mean_sqared_error 0.326 squared_correlaton 11.189
???
The same attribute (e.g. att3) has different value for regressionPerformance in both models. Can anyone tell me why?
Model 1
<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSetGenerator" class="ExampleSetGenerator" breakpoints="after">
<parameter key="target_function" value="sum"/>
</operator>
<operator name="FS" class="FeatureSelection" expanded="yes">
<parameter key="user_result_individual_selection" value="true"/>
<parameter key="keep_best" value="64"/>
<parameter key="maximum_number_of_generations" value="1"/>
<operator name="BootstrappingValidation" class="BootstrappingValidation" expanded="yes">
<parameter key="keep_example_set" value="true"/>
<parameter key="create_complete_model" value="true"/>
<operator name="LinearRegression" class="LinearRegression">
<parameter key="feature_selection" value="none"/>
</operator>
<operator name="ApplierChain" class="OperatorChain" expanded="yes">
<operator name="Applier" class="ModelApplier">
<parameter key="keep_model" value="true"/>
<list key="application_parameters">
</list>
</operator>
<operator name="RegressionPerformance" class="RegressionPerformance">
<parameter key="main_criterion" value="squared_correlation"/>
<parameter key="root_mean_squared_error" value="true"/>
<parameter key="squared_correlation" value="true"/>
</operator>
</operator>
</operator>
</operator>
</operator>
Model 2
<operator name="Root" class="Process" expanded="yes">
<operator name="Daten laden und vorbereiten" class="OperatorChain" expanded="yes">
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="sum"/>
</operator>
</operator>
<operator name="Attribute identifizieren, Ranking, Correalation" class="OperatorChain" expanded="yes">
<operator name="AttributeFilter" class="AttributeFilter">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="parameter_string" value="att3"/>
</operator>
<operator name="BootstrappingValidation" class="BootstrappingValidation" expanded="yes">
<parameter key="keep_example_set" value="true"/>
<parameter key="create_complete_model" value="true"/>
<operator name="LinearRegression" class="LinearRegression">
<parameter key="feature_selection" value="none"/>
</operator>
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="ModelApplier (2)" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
<operator name="RegressionPerformance" class="RegressionPerformance">
<parameter key="root_mean_squared_error" value="true"/>
<parameter key="absolute_error" value="true"/>
<parameter key="relative_error" value="true"/>
<parameter key="correlation" value="true"/>
<parameter key="squared_correlation" value="true"/>
<parameter key="skip_undefined_labels" value="false"/>
<parameter key="use_example_weights" value="false"/>
</operator>
</operator>
</operator>
</operator>
<operator name="ModelApplier" class="ModelApplier">
<parameter key="keep_model" value="true"/>
<list key="application_parameters">
</list>
</operator>
</operator>
Tagged:
0
Answers
-
Hi,
the quick answer is: Because you have two different processes. Even another usage order of random numbers can affect performance. You could use local_random_seeds to avoid this.
Greetings,
Sebastian0 -
dear
I don't no what you are meaning with local_random_seeds.
I have only integrate in model 1 the featureselection. I think, that is a posibility to test alle attributes itself an with combination to find out the best fit
with a linear model. But this is not a random process itself.
I will find out, what are the best attributes for prediction the label. And for this I gues the performance criteria - like the squared-corellation and the root-mean-squared-error.
best regards
Angela0 -
Have you really not thought of searching this forum, say on "local_random"?I don't no what you are meaning with local_random_seeds. 0 -
I now, what local_random_seeds is !haddock wrote:
Have you really not thought of searching this forum, say on "local_random"?
That is also a feature of RapidMiner, which makes it so special.
But please read my entire question ::)
Even a random process should not alter the quality (parameter of the regressionperformance) of each value.
I therefore assume that I can not compare parameter of the regressionsperformance for specific attributes in 2 different modells.
best regards
0 -
I have, and Seb has answered it, and....?But please read my entire question
0 -
Hi Angela,
of course a random sampling of examples affects the measured quality. And a random sampling is done by the BootstrappingValidations. Without the same random number sequence, it is not guaranteed that the same examples are selected. For example if one example which can be perfectly matched is not selected, but a outlier is selected twice, this will affect the performance heavily.
I would recommend using local random seed on your bootstrappingValidations, this should do the trick.
Greetings,
Sebastian0 -
Hi Sebastian,
many thanks for this answer. I have change the local_random_seed from: -1 to other values 1, 10,100 but I get the same values for
squared_correlation for the attributes.
But I found a other way to get the correct squared_correlation from the imfortance values.
Manys thanks for your help.
Angela0