Help with workaround for Tools.handleAverages

fig
fig New Altair Community Member
edited November 5 in Community Q&A
Hi,

It seems that IteratingPerformanceAverage does not handle nested averages properly, as demonstrated by the following process (which is a toy example of 2x2 Cross Validation):

<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="number_of_attributes" value="20"/>
        <parameter key="target_function" value="random"/>
    </operator>
    <operator name="IteratingPerformanceAverage" class="IteratingPerformanceAverage" expanded="yes">
        <parameter key="iterations" value="2"/>
        <operator name="XValidation" class="XValidation" expanded="yes">
            <parameter key="number_of_validations" value="2"/>
            <parameter key="sampling_type" value="shuffled sampling"/>
            <operator name="LinearRegression" class="LinearRegression">
            </operator>
            <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                <operator name="ModelApplier" class="ModelApplier">
                </operator>
                <operator name="RegressionPerformance" class="RegressionPerformance">
                    <parameter key="absolute_error" value="true"/>
                    <parameter key="main_criterion" value="absolute_error"/>
                </operator>
                <operator name="ProcessLog" class="ProcessLog">
                    <list key="log">
                      <parameter key="run" value="operator.XValidation.value.applycount"/>
                      <parameter key="fold" value="operator.XValidation.value.iteration"/>
                      <parameter key="error" value="operator.RegressionPerformance.value.absolute_error"/>
                    </list>
                </operator>
            </operator>
        </operator>
    </operator>
</operator>
After running the experiment the process log shows:
[tt]
run  fold   error
1 0 0.249
1 1 0.278
2 0 0.359
2 1 0.278
[/tt]

The average of the first run (first two folds) is 0.264, of the second run (last two folds) is 0.319, and the overall average is 0.291.  However if you look at the performance vector returned from IteratingPerformanceAverage it shows the value as 0.282.

This is because in Tools.handleAverage (the outer call, from IteratingPerformanceAverage.apply) the first average vector is the average from the first run, with a value of 0.264 and an average count = 2.  However when the second average vector (from the second run, with value 0.319) is folded in, in the call to Averagable.buildAverage, it is treated as having an average count of only 1, whereas it should really have the same weight as the first average vector.  (Thus the weighted average of (2*0.264 + 1*0.319)/3 gives the incorrect reported value of 0.282.)

Can anyone suggest how to work around this?

I am thinking that in Tools.handleAverages when the first average vector is inserted its average count should be set to 1.

Any help will be greatly appreciated.
Tagged:

Answers