ANOVA

steffen · September 2008

Hello all of you

I am currently playing messing around with statistics to check my validation results. Reading some literature I have a question about ANOVA. Since the operator is part of RM, I assume that it is considered useful.

Do you agree (with your experience), that the assumption of homogeneous variance can be ignored if the checked sequences have equal length and are approximately equally distributed (same distributions, but differing parameters) ?
What about Kruskal Wallis ? It may be more conservative (rejecting H0 more often), but since it is rank-based it can be applied to any performance measure without to much trouble (I suppose).
What about "local testers" like Scheffé or Turkey ? Is their absence in RM a consequence of agreement ("bah. Those are useless") or time ?

My current problem is to build a valid testsetup. I thought really deeply about this and ... I know that significance testing is not the way to ultimate truth, but for the first step I want to create a setup that is acceptable in terms of the current "state of the art". I have talked to other students and people at my home university and read a lot of papers which lead to the picture that significance testing is not thaaaaat important in data mining :-\

My current choice would be the Tukey-Test. ANOVA is (in my current point of view) as useful as a mathematical proof of existence.

many thanks in advance

greetings

Steffen

steffen · September 2008

I just want to justify the selection of Tukey (sorry, I confused it with Scheffé)

Tukey
-assumes normal distribution (since t-test is allowed for testing performance values like auc this should not be a problem)
-assumes that the samples have equal size (no problem)
-Tukey tells me where a difference is given (unlike ANOVA)
-Tukey is not that conservative (unlike rankbased Steel/Dwass. Rankbased procedures may be mor reliable, but I prefer less conversative tests)

greetings

Steffen

IngoRM · September 2008

Hi,

...which lead to the picture that significance testing is not thaaaaat important in data mining

Let's start with this: this perfectly describes my observation. In other (related) fields like optimization or evolutionary algorithms you simply have to add a significance test in order to demonstrate a significant improvement or your publication will simply be rejected. In data mining, this is most often not true and this "ignorance" naturally leads to papers of the kind "another algorithm which is 0.2% better on five selected UCI data sets and I did not even thought of testing if this improvement even is significant".

So, back to the questions:

Do you agree (with your experience), that the assumption of homogeneous variance can be ignored if the checked sequences have equal length and are approximately equally distributed (same distributions, but differing parameters) ?

I am not too much of an expert for the details (hey, after all I am a data miner

) but as far as I know you can ignore the test. At least this is what the statisticians I know usually do.

What about Kruskal Wallis ? It may be more conservative (rejecting H0 more often), but since it is rank-based it can be applied to any performance measure without to much trouble (I suppose).

What about "local testers" like Scheffé or Turkey ? Is their absence in RM a consequence of agreement ("bah. Those are useless") or time ?

For all of those the reason why they are missing is simple: lack of time combined with the fact that no one asked for them yet.

ANOVA is (in my current point of view) as useful as a mathematical proof of existence.

But that's exactly the point for all those significance tests: the results are only valid if the assumptions are correct. And for Tukey the assumptions are pretty similar to those for paired t-tests / ANOVA: if the data does not follow a normal distribution the results will simply not be valid at all.

Tukey tells me where a difference is given (unlike ANOVA)

But that's also true for paired t-test and still I cannot fully recommend those for all cases (beside the assumptions).

Tukey is not that conservative (unlike rankbased Steel/Dwass. Rankbased procedures may be mor reliable, but I prefer less conversative tests)

Sorry, I cannot comment on that. Anyone else?

Cheers,
Ingo

steffen · September 2008

Hello

Thank you Ingo for your estimation. I guess I got to restrain my efforts to find the best test for my current problem (instead of global truths) or I will never finish the project...

I just want to add a remark:

But that's exactly the point for all those significance tests: the results are only valid if the assumptions are correct. And for Tukey the assumptions are pretty similar to those for paired t-tests / ANOVA: if the data does not follow a normal distribution the results will simply not be valid at all.

But that's also true for paired t-test and still I cannot fully recommend those for all cases (beside the assumptions).

The problem is to find a test which is capable of multiple comparisons. Applying the paired t-test more than once is not valid since the problem of the cumulation of the alpha error. So...Anova and Tukey are capable, but meanwhile ANOVA just checks IF their is the difference between the means Tukey tells me WHERE the difference is.

aside: Today I stumbled on a paper using t-test for AUC, of course without an explanation. First one I have seen doing this...I found no argument for this, but ... sometimes I wonder if the problem is on my side, when I am trying to be more correct then some data mining researchers out there >:( .Seems to me like this parents to children relationship: children are not allowed to do certain things the parents do because the children (students) are not able to estimate the consequences...

*grumble*

Steffen

ANOVA

Welcome!

Answers

Welcome!

Welcome!

Quick Links

Categories