I was wondering how the maximal number of XValidations embedded into an EvolutionaryParameterOptimization
can be determined.
My settings for the evolutionary parameter optimization are:
"max_generations" value="5"
"generations_without_improval" value="-1" (on purpose to make things more clear)
"population_size" value="20"
"tournament_fraction" value="0.3"
And for the Xvalidation, the parameter "number_of_validations" is set to 2.
Here is the corresponding code:
<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSource" class="ExampleSource">
<parameter key="attributes" value="../data/polynomial.aml"/>
</operator>
<operator name="ParameterOptimization" class="EvolutionaryParameterOptimization" expanded="yes">
<list key="parameters">
<parameter key="LibSVMLearner.C" value="0.1:100"/>
<parameter key="LibSVMLearner.degree" value="2:7"/>
</list>
<parameter key="max_generations" value="5"/>
<parameter key="generations_without_improval" value="-1"/>
<parameter key="population_size" value="20"/>
<parameter key="tournament_fraction" value="0.3"/>
<parameter key="local_random_seed" value="2001"/>
<parameter key="show_convergence_plot" value="true"/>
<operator name="Validation" class="XValidation" expanded="yes">
<parameter key="number_of_validations" value="2"/>
<parameter key="sampling_type" value="shuffled sampling"/>
<operator name="LibSVMLearner" class="LibSVMLearner">
<parameter key="svm_type" value="epsilon-SVR"/>
<parameter key="kernel_type" value="poly"/>
<parameter key="C" value="76.53909856172457"/>
<list key="class_weights">
</list>
</operator>
<operator name="ApplierChain" class="OperatorChain" expanded="yes">
<operator name="Test" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
<operator name="Performance" class="Performance">
</operator>
</operator>
</operator>
<operator name="Log" class="ProcessLog">
<parameter key="filename" value="paraopt.log"/>
<list key="log">
<parameter key="C" value="operator.LibSVMLearner.parameter.C"/>
<parameter key="degree" value="operator.LibSVMLearner.parameter.degree"/>
<parameter key="performance" value="operator.Validation.value.performance"/>
<parameter key="iterations" value="operator.Validation.value.iteration"/>
</list>
</operator>
</operator>
</operator>
I would expect that that for each individual (within a population) 2 validations are performed. Since the
population size is 20, there are 2*20=40 validations in each generation. Using 5 generations I would
expect, 200 validations in total.
But when I check the output of the ProcessLog operator, the parameter optimization computes 248 performance
values, which in my opinion should represent one individual each, with 2 iterations (the two runs of the validation).
Thus, in total 2*248=596 validations are performed in total. Why not just 200?
Marcus