Confused by the numerical XValidation output
Hi,
Here is my question this time: why the RMS printed by the XValidation decreases with # of validations?
Here is a simple example:
Data set:
X,Y
0, 0.18224201
1, 2.002307783
2, 4.187028114
...
49, 98.21944595
(this is simply Y = 2*X + rand() - 0.5)
Standard XVal experiment:
<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSource" class="ExampleSource">
<parameter key="attributes" value="H:\tmp\lin.aml"/>
</operator>
<operator name="XValidation" class="XValidation" expanded="yes">
<parameter key="create_complete_model" value="true"/>
<parameter key="keep_example_set" value="true"/>
<parameter key="number_of_validations" value="60"/>
<parameter key="sampling_type" value="shuffled sampling"/>
<operator name="LinearRegression" class="LinearRegression">
<parameter key="feature_selection" value="none"/>
<parameter key="keep_example_set" value="true"/>
</operator>
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
<operator name="Performance" class="Performance">
</operator>
</operator>
</operator>
</operator>
When i increase the number_of_validations, here is what happens:
no_of_val rms_error
10 0.271 +- 0.040
20 0.258 +- 0.087
30 0.248 +- 0.117
40 0.252 +- 0.122
50 0.239 +- 0.140
I would expect, with # of validations, the error should remain about the same (because it's determined by the rand() ) and its uncertainty decrease?
Thanks!
Here is my question this time: why the RMS printed by the XValidation decreases with # of validations?
Here is a simple example:
Data set:
X,Y
0, 0.18224201
1, 2.002307783
2, 4.187028114
...
49, 98.21944595
(this is simply Y = 2*X + rand() - 0.5)
Standard XVal experiment:
<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSource" class="ExampleSource">
<parameter key="attributes" value="H:\tmp\lin.aml"/>
</operator>
<operator name="XValidation" class="XValidation" expanded="yes">
<parameter key="create_complete_model" value="true"/>
<parameter key="keep_example_set" value="true"/>
<parameter key="number_of_validations" value="60"/>
<parameter key="sampling_type" value="shuffled sampling"/>
<operator name="LinearRegression" class="LinearRegression">
<parameter key="feature_selection" value="none"/>
<parameter key="keep_example_set" value="true"/>
</operator>
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
<operator name="Performance" class="Performance">
</operator>
</operator>
</operator>
</operator>
When i increase the number_of_validations, here is what happens:
no_of_val rms_error
10 0.271 +- 0.040
20 0.258 +- 0.087
30 0.248 +- 0.117
40 0.252 +- 0.122
50 0.239 +- 0.140
I would expect, with # of validations, the error should remain about the same (because it's determined by the rand() ) and its uncertainty decrease?
Thanks!