X-Validation Bug
marcin_blachnik
New Altair Community Member
Hello
I have noticed a bug in X-Validatin operator. I guess you forgot to make a clone of the test set becouse whenever I want to access training set on the test side of X-Validation I have a view only on the test samples. This bug appears when I use through port and also when I use remember/recall. At the moment the problem can be solved by materializing training data before connecting to through port.
Bug example is provided below:
I would also suggest that the model output of the training subprocess of X-Validation shouldn't be required to execute the main process. Now it is required to use some dummy operator to execute the process.
I have noticed a bug in X-Validatin operator. I guess you forgot to make a clone of the test set becouse whenever I want to access training set on the test side of X-Validation I have a view only on the test samples. This bug appears when I use through port and also when I use remember/recall. At the moment the problem can be solved by materializing training data before connecting to through port.
Bug example is provided below:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.007">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.007" expanded="true" name="Process">
<process expanded="true" height="625" width="926">
<operator activated="true" class="retrieve" compatibility="5.2.007" expanded="true" height="60" name="Retrieve" width="90" x="71" y="33">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="x_validation" compatibility="5.2.007" expanded="true" height="112" name="Validation" width="90" x="214" y="34">
<process expanded="true" height="625" width="438">
<operator activated="true" class="default_model" compatibility="5.2.007" expanded="true" height="76" name="Default Model" width="90" x="112" y="30"/>
<connect from_port="training" to_op="Default Model" to_port="training set"/>
<connect from_op="Default Model" from_port="model" to_port="model"/>
<connect from_op="Default Model" from_port="exampleSet" to_port="through 1"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
<portSpacing port="sink_through 2" spacing="0"/>
</process>
<process expanded="true" height="625" width="438">
<operator activated="true" breakpoints="before" class="k_nn" compatibility="5.2.007" expanded="true" height="76" name="k-NN" width="90" x="45" y="120"/>
<operator activated="true" class="apply_model" compatibility="5.2.007" expanded="true" height="76" name="Apply Model" width="90" x="179" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" compatibility="5.2.007" expanded="true" height="76" name="Performance" width="90" x="313" y="30"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_port="through 1" to_op="k-NN" to_port="training set"/>
<connect from_op="k-NN" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="source_through 2" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="training" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
I would also suggest that the model output of the training subprocess of X-Validation shouldn't be required to execute the main process. Now it is required to use some dummy operator to execute the process.
Tagged:
0
Answers
-
Hi Marcin,
thanks for your report. I created a bug report for this: http://bugs.rapid-i.com/show_bug.cgi?id=1206
We will probably not change the behaviour of the model output though, since 99% of the users will use it in the "normal" way, and the warning/error will help a lot of new (and probably also experienced but forgetful) users.
Best, Marius0 -
Thank you for your response.
I just want to mention that it also appear when using remember/recall operators. So I do Remember on the training side and recall on the test side.
It is very confusing.
Moreover it doesn't appear in parallel X-validation and Bootstrapping Validation, but appear in Split Validation also.
Best regards
Marcin0