First off, thanks for adding the scripting operator to RM, it makes it 10000x more useful.
I have a binominal attribute which groups results into two classes, and I want to know if these classes are different in any significant way. I have implemented the following for doing a Welch's T-Test (this is a T-Test where the sample sizes are different and the variances are different):
<process expanded="true" height="573" width="1016">
<operator activated="true" class="subprocess" expanded="true" height="94" name="Welch's T-Test" width="90" x="112" y="300">
<process expanded="true" height="591" width="1135">
<operator activated="true" class="aggregate" expanded="true" height="76" name="Aggregate" width="90" x="112" y="30">
<list key="aggregation_attributes">
<parameter key="%{loop_attribute}" value="average"/>
<parameter key="%{loop_attribute}" value="count"/>
<parameter key="%{loop_attribute}" value="variance"/>
</list>
<parameter key="group_by_attributes" value="%{group}"/>
</operator>
<operator activated="true" class="execute_script" expanded="true" height="94" name="Execute Script" width="90" x="313" y="30">
<parameter key="script" value="def withGroup = input[0].getExample(0) def withoutGroup = input[0].getExample(1) def attrs = input[0].getAttributes() def avg = attrs.getRegular("average(%{loop_attribute})") def cnt = attrs.getRegular("count(%{loop_attribute})") def var = attrs.getRegular("variance(%{loop_attribute})") // Find T def n1 = withGroup.getValue(cnt) def n2 = withoutGroup.getValue(cnt) def s1 = withGroup.getValue(var) / n1 def s2 = withoutGroup.getValue(var) / n2 operator.getProcess().getLog().log("s1: " + s1.toString()); operator.getProcess().getLog().log("s2: " + s2.toString()); def t = (withGroup.getValue(avg) - withoutGroup.getValue(avg)) / Math.sqrt(s1 + s2) // Find the degrees of freedom def numerator = Math.pow(s1 + s2, 2) def denominator = (Math.pow(s1,2) / (n1 - 1)) + (Math.pow(s2,2) / (n2 - 1)) def df = numerator / denominator operator.getProcess().getMacroHandler().addMacro("t",t.toString()) operator.getProcess().getMacroHandler().addMacro("df",df.toString())"/>
Basically, I use an aggregate operator, then do some custom processing of the results.
Is there an easier way to do this? If not, would it be useful if I developed this as an operator? The existing T-Test and ANOVA seem to only allow inputs of performance vectors and not example sets like I want.