"leave_one_out_performance_problem"

New Altair Community Member

Jul 13, 2012

With leave one out on N examples, you train N models, each on (N-1) examples and measure its performance (its accuracy in this case) on the remaining example.
The first part of the displayed accuracy is the mean accuracy of all N models, and the second part is the standard deviation.

Best,
Marius

bojansimoski

New Altair Community Member

Jul 13, 2012

Thanks, and can you or someone else explain me how is the standard deviation computed in this case?

New Altair Community Member

Jul 16, 2012

Yes, as usual it's basically the square root of the difference of the mean of squares and the squared mean of the performance values of each iteration. There is a wikipedia article about the standard deviation: http://en.wikipedia.org/wiki/Standard_deviation

Best, Marius

New Altair Community Member

I do not get. Because in that case you have only one element in the validation set so the result will be 0 (incorrect) and 1 (correct)...
And interpreting the results in that situation they are strange.
I got results
84.26 +/- 36.08 or 63.38 +/- 47.57

and if in both cases I assume that this standart deviation is computed as sqrt(p(1-p)). Taking as p=accuracy (so p=0.8426. for instance) I got then the value 0f the standard deviation shown . In the example sqrt(0.8426(1-0.8426)). But this I think is not ok, bacause accuracy is not a bernoulli distribution. I think the value should be further divided by sqrt(N).... So my question is as Bojan how is this standard deviation computed?

thank you?
AMT

New Altair Community Member

Please remember that in a leave-one-out validation of a set with n examples, you have n iterations. If you have a look at the definition e.g. at the wikipedia article linked above, you see that you are missing some sum and sqrt operations in your formula.

Best,
Marius

New Altair Community Member

Sorry I do not get your answer. The definitions that are in wikipedia are standard way to estimate the variance and std. But if in each iteration I got real numbers, like 0.7, 0.8, 0.9 and so on...
But here I do not think that it is what it was used. With one example you got correct and non-correct.
At the the end of the n iterations, a count variable with a binomial distribution is obtained as at each iteration a bernoulli distribution.
And what I was pointing it is that this standard deviation seems to be estimated using the formulas of the standard deviation for a bernoulli distribution ----- sqrt(p(1-p))) ------ and this I did not found in wikipedia page you point. So how it is really estimated the standard deviation.
Another point it is how you interpret a result like the ones I showed where performance can have such large spread? Even being larger than 100%?

New Altair Community Member

Hi,

you can transform: p(1-p) = p - p^2, which is equivalent to the standard formula for the standard deviation where the values are only 0 or 1.

Best,
Marius

New Altair Community Member

You mean the variance, is not the standard deviatiion is the square root...

But this is the point. I think that to compute the std (standard deviation) of the accuracy you need further divide by sqrt(n) ... What do you think?

Greetings

A.M. Tomé

New Altair Community Member

Jul 19, 2012

AMT wrote:

You mean the variance, is not the standard deviatiion is the square root...

AMT wrote:	AMT wrote:
AMT wrote:	You mean the variance, is not the standard deviatiion is the square root...

Sure, consider this as a typo

But this is the point. I think that to compute the std (standard deviation) of the accuracy you need further divide by sqrt(n) ... What do you think?

Now I got you. Seems to be reasonable. What we are currently displaying is the standard deviation of the accuracy values, not the stddev of the mean accuracy value, as far as I see it. In the default case where you have continuous accuracies this is often what you are interested in, since it indicates the robustness of your model - if you have a large standard deviation in your performances, this indicates that the model does not generalize well and might be overfitted (or simply is not suited for your data, or ...).

With accuracy values in 0 and 1 the usefulness of this value is certainly questionable. Same applies to the +- notation, since it's not the error of the accuracy.
We will discuss that here at Rapid-I. Thanks for your input!

Best,
~Marius

New Altair Community Member

Aug 8, 2012

Any new about this comment?

AMT