Welch's T-Test

Xodarap · March 2010

First off, thanks for adding the scripting operator to RM, it makes it 10000x more useful.

I have a binominal attribute which groups results into two classes, and I want to know if these classes are different in any significant way. I have implemented the following for doing a Welch's T-Test (this is a T-Test where the sample sizes are different and the variances are different):


<process expanded="true" height="573" width="1016">
              <operator activated="true" class="subprocess" expanded="true" height="94" name="Welch's T-Test" width="90" x="112" y="300">
                <process expanded="true" height="591" width="1135">
                  <operator activated="true" class="aggregate" expanded="true" height="76" name="Aggregate" width="90" x="112" y="30">
                    <list key="aggregation_attributes">
                      <parameter key="%{loop_attribute}" value="average"/>
                      <parameter key="%{loop_attribute}" value="count"/>
                      <parameter key="%{loop_attribute}" value="variance"/>
                    </list>
                    <parameter key="group_by_attributes" value="%{group}"/>
                  </operator>
                  <operator activated="true" class="execute_script" expanded="true" height="94" name="Execute Script" width="90" x="313" y="30">
                    <parameter key="script" value="def withGroup = input[0].getExample(0)&#10;def withoutGroup = input[0].getExample(1)&#10;def attrs = input[0].getAttributes()&#10;def avg = attrs.getRegular(&quot;average(%{loop_attribute})&quot;)&#13;&#10;def cnt = attrs.getRegular(&quot;count(%{loop_attribute})&quot;)&#13;&#10;def var = attrs.getRegular(&quot;variance(%{loop_attribute})&quot;)&#13;&#10;&#13;&#13;&#10;// Find T&#13;&#10;def n1 = withGroup.getValue(cnt)&#13;&#10;def n2 = withoutGroup.getValue(cnt)&#10;def s1 = withGroup.getValue(var) / n1&#10;def s2 = withoutGroup.getValue(var) / n2&#10;operator.getProcess().getLog().log(&quot;s1: &quot; + s1.toString());&#13;&#10;operator.getProcess().getLog().log(&quot;s2: &quot; + s2.toString());&#13;&#10;&#13;def t = (withGroup.getValue(avg) - withoutGroup.getValue(avg)) / Math.sqrt(s1 + s2)&#13;&#10;&#13;&#10;// Find the degrees of freedom&#13;&#10;def numerator = Math.pow(s1 + s2, 2)&#13;&#10;def denominator = (Math.pow(s1,2) / (n1 - 1)) + (Math.pow(s2,2) / (n2 - 1))&#13;&#10;def df = numerator / denominator&#13;&#10;&#13;&#10;operator.getProcess().getMacroHandler().addMacro(&quot;t&quot;,t.toString())&#13;&#10;operator.getProcess().getMacroHandler().addMacro(&quot;df&quot;,df.toString())"/>

Basically, I use an aggregate operator, then do some custom processing of the results.

Is there an easier way to do this? If not, would it be useful if I developed this as an operator? The existing T-Test and ANOVA seem to only allow inputs of performance vectors and not example sets like I want.

land · March 2010

Hi,
if it eases your life, feel free to implement your own Extensions. There are several users asking for more statistical tests, so it might even be a benefit for the whole community. If your extension proofs helpful and if you wish it, we could even put it on the update server to make it available to the public.

Greetings,
Sebastian

Xodarap · March 2010

For anyone who is interested in using this as an operator see here: http://philosophyforprogrammers.blogspot.com/2010/03/rapidminer-and-ttests.html.

If you think it's useful I'm more than happy to sign whatever waivers to commit it to the main RM update server.

Welch's T-Test

Answers

Categories