Community & Support
Learn
Marketplace
Discussions
Categories
Discussions
General
Platform
Academic
Partner
Regional
Explore Siemens Communities
User Groups
Documentation
Events
Altair Exchange
Share or Download Projects
Resources
News & Instructions
Programs
YouTube
Employee Resources
This tab can be seen by employees only. Please do not share these resources externally.
Groups
Join a User Group
Support
Home
Discussions
Community Q&A
Newbie question: XValidation
nicugeorgian
Hi,
For a cross validation process with, e.g.,
XValidation
, the example set
S
is splitt up into, say, 3 subsets:
S
1
,
S
2
, and
S
3
.
The inner operator of
XValidation
is applied then
once with
S
1
as
test
set and
S
2
U
S
3
as
training
set,
and then
once with
S
2
as
test
set and
S
1
U
S
3
as
training
set,
and then
once with
S
3
as
test
set and
S
1
U
S
2
as
training
set.
For each of these runs, a model is returned.
My question is how to decide in general what model is the best? Or there is
no
best
model ...
Thanks,
Geo
Find more posts tagged with
AI Studio
Accepted answers
All comments
IngoRM
Hi Geo,
this is actually a question we got most often during the last years and there seem to be a lot of misunderstanding in properly evaluating models with cross validation techniques. The answer is as simple as this:
none
of the models created for the single folds is the best. The best one is the one trained on the complete data set or on a well chosen sample (it is not the task of a cross validation to find such a sample).
If you ask which is the best one I would ask "What
should
be the best model?" The one with the lowest error on the corresponding test set? Well, this would be again like overfitting but now not on a training but on a test set. So it is probably not a good idea to simply select a model because of the test error alone.
The best thing one can do is to think of cross validation as a process which is completely independent of the learning process:
1) One process is the learning process which is performed on the complete data.
2) But now you also want to now how good your model will perform if it is employed on completely unseen data. This is where the second process comes into the game: estimating the predictive power of your model. The best estimation you could get is calculated with leave-one-out (LOO) where all but one examples are used for training and only the remaining one for the test. Since almost all examples are used for training, the model is the most similar one compared to the model trained on the complete data. Since LOO is rather slow on large data sets, we often use a k-fold cross validation in order to get a good estimation in less time.
Hope that makes things a bit clearer. Cheers,
Ingo
nicugeorgian
Ingo, many thanks for the very detailed answer!
I have somehow anticipated your answer when I wrote
Or there is
no best
model ...
IngoRM
Hi,
I have somehow anticipated your answer when I wrote
I already though that but we get this question so often that I thought a longer answer might be a good idea so that we can post a link here in the future
Cheers,
Ingo
reports01
Creating a good model is a tricky business.
By using too much data, too little, or comparing and optimising your model on different dataset (bootstrapping), you run the risk of overfitting your model.
Taking a set of cases 'C' and a model 'M', the Coefficent of Concordance (CoC) is an indication on how good a model can distinguish cases into the defined catagories. [M.G. Kendall (1948) Rank correlation methods, Griffin, Londen]
When the CoC of a model is 50%, you actually have a random model (below 50%, your model is "cross-wirde"), so 50% is the lowest CoC you will get.
Accuracy is a measure that indicates the number of mismatched cases of your model in comparenson to the total amount of cases, this is different than your CoC.
These two measures (CoC and Accuracy) determin how good a model is.
For instance, when we sort the scored cases by the outcome predicted and the actual outcome:
....BBBBBBBBBBBBBBBBBB|GGGGGGGGGGGGGGGGGG.... Here we have 100% CoC
the accuracy is determined by the number of cases that are actually scored correctly
....BBBBBBBBB|GBBGBGGBBBGGBGGGB|GGGGGGGGGG.... Here is a more realistic picture, naturaly CoC is below 100%
Now by determining stratigacly where you will place your cutt-off, the accuracy can be determined.
If you place your cut-off higher, you take a lower risk, and your accuracy will be high
Accepting more risk, with a lower accuracy, you will place your cut-off lower, allowing yourself a bigger market share.
steffen
Hello
...reviving an old discussion...
My question is (since I am currently checking the possibilities to validate a ranking classifier without applying a cutoff / threshold) why anyone should bother to use the CoC ? It is much easier to calculate the sum of the ranks of the TP. This value can be easily transformed to the [0,1]-interval (e.g. 1 = optimal ranking, 0 = worst ranking).
I know that the CoC is the value of teststatistics for the Kendall CoC-test, so a statistical test can be applied. But this test is only meant to notify whether there is any difference (in agreement), just like ANOVA. I am looking for a test for multiple comparisons to know WHERE the difference occurs (e.g. Tukey-Test). The only test I found for this case is Friedman Ranksumtest.
another one:
[quote author=mierswa]
But now you also want to now how good your model will perform if it is employed on completely unseen data. This is where the second process comes into the game: estimating the predictive power of your model. The best estimation you could get is calculated with leave-one-out (LOO) where all but one examples are used for training and only the remaining one for the test. Since almost all examples are used for training, the model is the most similar one compared to the model trained on the complete data. Since LOO is rather slow on large data sets, we often use a k-fold cross validation in order to get a good estimation in less time.
[/quote]
Hm,hm. Recently I read a very interesting Phd-Thesis from Ron Kohavi (
Click
), who has shown that LOO reduces the variance, but increases the bias (i.e. stability). Imagine a binary classification problem with 50% of all instances got label=1 and 50% got label=0. Now apply a majority classifier. Using LOOC the accuracy will be zero.
However, Kohavi concludes that it is best to apply 6-10-fold-crossvalidation and repeat it 10-20 times to reduce variance. Note that repeating CV increases the alpha-error if you plan to use statistical tests to validate the results.
...so we returned to the suggestion that 10-fold-cv is the best procedure you can use
. I just wanted to accentuate the argument...
greetings,
Steffen
IngoRM
Hi Steffen,
yes, I know Ron's thesis and this is actually a good point. So my explanation about LOO might be a bit misleading. Anyway, I just wanted to give the readers a feeling how error estimation with any cross-validation-like process and a model learned from the complete data are connected. The reason for this explanation was quite simple: it's probably one of the most often asked questions - or at least it used to be some time ago.
Cheers,
Ingo
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups