ANOVA
steffen
New Altair Community Member
Hello all of you
I am currently playing messing around with statistics to check my validation results. Reading some literature I have a question about ANOVA. Since the operator is part of RM, I assume that it is considered useful.
My current choice would be the Tukey-Test. ANOVA is (in my current point of view) as useful as a mathematical proof of existence.
many thanks in advance
greetings
Steffen
I am currently playing messing around with statistics to check my validation results. Reading some literature I have a question about ANOVA. Since the operator is part of RM, I assume that it is considered useful.
- Do you agree (with your experience), that the assumption of homogeneous variance can be ignored if the checked sequences have equal length and are approximately equally distributed (same distributions, but differing parameters) ?
- What about Kruskal Wallis ? It may be more conservative (rejecting H0 more often), but since it is rank-based it can be applied to any performance measure without to much trouble (I suppose).
- What about "local testers" like Scheffé or Turkey ? Is their absence in RM a consequence of agreement ("bah. Those are useless") or time ?
My current choice would be the Tukey-Test. ANOVA is (in my current point of view) as useful as a mathematical proof of existence.
many thanks in advance
greetings
Steffen
Tagged:
0
Answers
-
I just want to justify the selection of Tukey (sorry, I confused it with Scheffé)
Tukey
-assumes normal distribution (since t-test is allowed for testing performance values like auc this should not be a problem)
-assumes that the samples have equal size (no problem)
-Tukey tells me where a difference is given (unlike ANOVA)
-Tukey is not that conservative (unlike rankbased Steel/Dwass. Rankbased procedures may be mor reliable, but I prefer less conversative tests)
greetings
Steffen0 -
Hi,
Let's start with this: this perfectly describes my observation. In other (related) fields like optimization or evolutionary algorithms you simply have to add a significance test in order to demonstrate a significant improvement or your publication will simply be rejected. In data mining, this is most often not true and this "ignorance" naturally leads to papers of the kind "another algorithm which is 0.2% better on five selected UCI data sets and I did not even thought of testing if this improvement even is significant".
...which lead to the picture that significance testing is not thaaaaat important in data mining
So, back to the questions:
I am not too much of an expert for the details (hey, after all I am a data miner ) but as far as I know you can ignore the test. At least this is what the statisticians I know usually do.
Do you agree (with your experience), that the assumption of homogeneous variance can be ignored if the checked sequences have equal length and are approximately equally distributed (same distributions, but differing parameters) ?
What about Kruskal Wallis ? It may be more conservative (rejecting H0 more often), but since it is rank-based it can be applied to any performance measure without to much trouble (I suppose).
For all of those the reason why they are missing is simple: lack of time combined with the fact that no one asked for them yet.
What about "local testers" like Scheffé or Turkey ? Is their absence in RM a consequence of agreement ("bah. Those are useless") or time ?
But that's exactly the point for all those significance tests: the results are only valid if the assumptions are correct. And for Tukey the assumptions are pretty similar to those for paired t-tests / ANOVA: if the data does not follow a normal distribution the results will simply not be valid at all.
ANOVA is (in my current point of view) as useful as a mathematical proof of existence.
But that's also true for paired t-test and still I cannot fully recommend those for all cases (beside the assumptions).
Tukey tells me where a difference is given (unlike ANOVA)
Sorry, I cannot comment on that. Anyone else?
Tukey is not that conservative (unlike rankbased Steel/Dwass. Rankbased procedures may be mor reliable, but I prefer less conversative tests)
Cheers,
Ingo0 -
Hello
Thank you Ingo for your estimation. I guess I got to restrain my efforts to find the best test for my current problem (instead of global truths) or I will never finish the project...
I just want to add a remark:But that's exactly the point for all those significance tests: the results are only valid if the assumptions are correct. And for Tukey the assumptions are pretty similar to those for paired t-tests / ANOVA: if the data does not follow a normal distribution the results will simply not be valid at all.
The problem is to find a test which is capable of multiple comparisons. Applying the paired t-test more than once is not valid since the problem of the cumulation of the alpha error. So...Anova and Tukey are capable, but meanwhile ANOVA just checks IF their is the difference between the means Tukey tells me WHERE the difference is.But that's also true for paired t-test and still I cannot fully recommend those for all cases (beside the assumptions).
aside: Today I stumbled on a paper using t-test for AUC, of course without an explanation. First one I have seen doing this...I found no argument for this, but ... sometimes I wonder if the problem is on my side, when I am trying to be more correct then some data mining researchers out there >:( .Seems to me like this parents to children relationship: children are not allowed to do certain things the parents do because the children (students) are not able to estimate the consequences...
*grumble*
Steffen
0