random classifier's accuracy

meliniak · July 2011

let's say we have a dataset with >2 label values - let it be 3 for the sake of simplicity. label values are unevenly distributed. my question is: what's the best accuracy a random classifier can have on such dataset?

IngoRM · July 2011

Hi,

let's first define what is meant by "random classifier":

Option A: The classifier randomly selects a prediction from the possible label values for each prediction. This prediction might follow a specific distribution or not, for example the prediction could be chosen according to the label distribution of the training data.

Option B: The classifier simply alway predicts the major class. This is called "Default Learner" in RapidMiner but I also have heard that people call this random classifier in the past.

For the best accuracy which can be reached I would say:

Option A: 100%. By chance, the classifier can predict all cases correctly. Of course this is less likely as the number of examples grows.

Option B: number of examples in major class / total number of examples.

Although the best reachable accuracy will stay 100% for option A, it is more likely that you would end up with the major class fraction for larger numbers of test examples.

Cheers,
Ingo

tabazim · July 2011

Thanks for posting this intuitive question and giving me a chance to clarify my understanding about random classifiers. May I know if the random classifier also tells us anything about the worst performance one can achieve in an 'n' class problem. Suppose n=2 for the sake of simplicity,and the data is equibalanced, then does a random classifier's performance tells us that the performance of any other classifier on this data cant be less than 50%. If not how is it used to assess the quality of any classifier in case of balanced and unbalanced data both? I hope the question is clear enough to respond,if not kindly let me know. Thanks!

random classifier's accuracy

Answers

Welcome!

Welcome!

Quick Links

Categories