X-fold-cross validation on predefined folds [SOLVED]
drakula
New Altair Community Member
Hello,
How can I perform an x-fold-cross validation process on already created folds? That is, I have 10 predefined train/test set pairs and I need to apply them all on the same learner (so ultimately I can optimize this learner on the defined folds by using EvolutionaryParameterOptimization operator)? Hope this is clear enough... Thanks in advance!
How can I perform an x-fold-cross validation process on already created folds? That is, I have 10 predefined train/test set pairs and I need to apply them all on the same learner (so ultimately I can optimize this learner on the defined folds by using EvolutionaryParameterOptimization operator)? Hope this is clear enough... Thanks in advance!
Tagged:
0
Answers
-
Clear enough. Use the operator Batch-X-Validation. This operator needs that one of the variables in your dataset has the role of "batch".0
-
Thank you very much!0
-
Um, I have just realized I don't actually do a standard 10-fold-cross validation. In my setting, data is divided in 10 folds, but in each round 6 folds are used as train data and 4 as test data.
So it would be really helpful if I could somehow just load mine train/test pairs and apply them all on the same learner. Or can I somehow set this test/train ratio in Batch-X-Validation?0 -
Then my previous answer (use batch-X-validation) probably is not the best idea. I guess you could achieve the results you want using the operator FilterExamples with condition class = attribute value filter. Or perhaps there is some operator that involves looping in a more efficient way. But somebody else will have to give you a hand here. It is beyond my limited expertise :-)0
-
I'm thinking of something like this (I'm sure there is a more economical way of doing it):
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
The dataset used looks like this:
<process version="5.2.006">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
<process expanded="true" height="476" width="882">
<operator activated="true" class="retrieve" compatibility="5.2.006" expanded="true" height="60" name="Retrieve (2)" width="90" x="45" y="210">
<parameter key="repository_entry" value="//Clases/Datos/batch"/>
</operator>
<operator activated="true" class="multiply" compatibility="5.2.006" expanded="true" height="94" name="Multiply" width="90" x="45" y="30"/>
<operator activated="true" class="filter_examples" compatibility="5.2.006" expanded="true" height="76" name="Filter Examples" width="90" x="246" y="30">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="pair1 = training"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.2.006" expanded="true" height="76" name="Select Attributes" width="90" x="380" y="30">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="|income"/>
</operator>
<operator activated="true" class="linear_regression" compatibility="5.2.006" expanded="true" height="94" name="Linear Regression" width="90" x="514" y="30">
<parameter key="feature_selection" value="none"/>
</operator>
<operator activated="true" class="filter_examples" compatibility="5.2.006" expanded="true" height="76" name="Filter Examples (2)" width="90" x="246" y="165">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="pair1 = testing"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.2.006" expanded="true" height="76" name="Select Attributes (2)" width="90" x="380" y="165">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="|income"/>
</operator>
<operator activated="true" class="apply_model" compatibility="5.2.006" expanded="true" height="76" name="Apply Model" width="90" x="581" y="165">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" compatibility="5.2.006" expanded="true" height="76" name="Performance" width="90" x="733" y="162"/>
<connect from_op="Retrieve (2)" from_port="output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Multiply" from_port="output 2" to_op="Filter Examples (2)" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Filter Examples (2)" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
<connect from_op="Select Attributes (2)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="126"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
pair1,pair2,income,consumption
training,testing,119,154
training,testing,85,123
training,training,97,125
training,testing,95,130
training,training,120,151
training,training,92,131
training,training,105,141
training,training,110,141
training,training,98,130
training,testing,98,134
training,training,81,115
training,training,81,117
training,training,91,123
training,training,105,144
training,training,100,137
training,training,107,140
training,training,82,123
training,training,84,115
training,testing,100,134
training,testing,108,147
training,training,116,144
training,training,115,144
training,training,93,126
training,training,105,141
training,training,89,124
training,training,104,144
training,training,108,144
training,training,88,129
training,training,109,137
training,training,112,144
testing,testing,96,132
testing,training,89,125
testing,training,93,126
testing,testing,114,140
testing,training,81,120
testing,training,84,118
testing,testing,88,119
testing,training,96,131
testing,training,82,127
testing,testing,114,150
0 -
Thank you very much0
-
I kept playing with your question. This is the best I could come up with. This is in case the number of pairs is too large.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.006">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
<process expanded="true" height="476" width="1016">
<operator activated="true" class="retrieve" compatibility="5.2.006" expanded="true" height="60" name="Retrieve" width="90" x="112" y="75">
<parameter key="repository_entry" value="//Clases/Datos/batch"/>
</operator>
<operator activated="true" class="loop" compatibility="5.2.006" expanded="true" height="76" name="Loop" width="90" x="313" y="75">
<parameter key="set_iteration_macro" value="true"/>
<parameter key="iterations" value="2"/>
<process expanded="true" height="740" width="969">
<operator activated="true" class="multiply" compatibility="5.2.006" expanded="true" height="94" name="Multiply (2)" width="90" x="112" y="165"/>
<operator activated="true" class="filter_examples" compatibility="5.2.006" expanded="true" height="76" name="Filter Examples (3)" width="90" x="246" y="30">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="pair%{iteration} = training"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.2.006" expanded="true" height="76" name="Select Attributes (3)" width="90" x="380" y="30">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="|income"/>
</operator>
<operator activated="true" class="linear_regression" compatibility="5.2.006" expanded="true" height="94" name="Linear Regression (2)" width="90" x="514" y="30">
<parameter key="feature_selection" value="none"/>
</operator>
<operator activated="true" class="filter_examples" compatibility="5.2.006" expanded="true" height="76" name="Filter Examples (4)" width="90" x="313" y="300">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="pair%{iteration} = testing"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.2.006" expanded="true" height="76" name="Select Attributes (4)" width="90" x="447" y="255">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="income|"/>
</operator>
<operator activated="true" class="apply_model" compatibility="5.2.006" expanded="true" height="76" name="Apply Model (2)" width="90" x="648" y="210">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" compatibility="5.2.006" expanded="true" height="76" name="Performance (2)" width="90" x="782" y="210"/>
<connect from_port="input 1" to_op="Multiply (2)" to_port="input"/>
<connect from_op="Multiply (2)" from_port="output 1" to_op="Filter Examples (3)" to_port="example set input"/>
<connect from_op="Multiply (2)" from_port="output 2" to_op="Filter Examples (4)" to_port="example set input"/>
<connect from_op="Filter Examples (3)" from_port="example set output" to_op="Select Attributes (3)" to_port="example set input"/>
<connect from_op="Select Attributes (3)" from_port="example set output" to_op="Linear Regression (2)" to_port="training set"/>
<connect from_op="Linear Regression (2)" from_port="model" to_op="Apply Model (2)" to_port="model"/>
<connect from_op="Filter Examples (4)" from_port="example set output" to_op="Select Attributes (4)" to_port="example set input"/>
<connect from_op="Select Attributes (4)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
<connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
<connect from_op="Performance (2)" from_port="performance" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Loop" to_port="input 1"/>
<connect from_op="Loop" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="126"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0