Hello all of you
I am currently
playing messing around with statistics to check my validation results. Reading some literature I have a question about ANOVA. Since the operator is part of RM, I assume that it is considered useful.
- Do you agree (with your experience), that the assumption of homogeneous variance can be ignored if the checked sequences have equal length and are approximately equally distributed (same distributions, but differing parameters) ?
- What about Kruskal Wallis ? It may be more conservative (rejecting H0 more often), but since it is rank-based it can be applied to any performance measure without to much trouble (I suppose).
- What about "local testers" like Scheffé or Turkey ? Is their absence in RM a consequence of agreement ("bah. Those are useless") or time ?
My current problem is to build a
valid testsetup. I thought really deeply about this and ... I know that significance testing is not the way to ultimate truth, but for the first step I want to create a setup that is acceptable in terms of the current "state of the art". I have talked to other students and people at my home university and read a lot of papers which lead to the picture that significance testing is not thaaaaat important in data mining :-\
My current choice would be the Tukey-Test. ANOVA is (in my current point of view) as useful as a mathematical proof of existence.
many thanks in advance
greetings
Steffen