random classifier's accuracy
meliniak
New Altair Community Member
let's say we have a dataset with >2 label values - let it be 3 for the sake of simplicity. label values are unevenly distributed. my question is: what's the best accuracy a random classifier can have on such dataset?
Tagged:
0
Answers
-
Hi,
let's first define what is meant by "random classifier":
Option A: The classifier randomly selects a prediction from the possible label values for each prediction. This prediction might follow a specific distribution or not, for example the prediction could be chosen according to the label distribution of the training data.
Option B: The classifier simply alway predicts the major class. This is called "Default Learner" in RapidMiner but I also have heard that people call this random classifier in the past.
For the best accuracy which can be reached I would say:
Option A: 100%. By chance, the classifier can predict all cases correctly. Of course this is less likely as the number of examples grows.
Option B: number of examples in major class / total number of examples.
Although the best reachable accuracy will stay 100% for option A, it is more likely that you would end up with the major class fraction for larger numbers of test examples.
Cheers,
Ingo
0 -
Thanks for posting this intuitive question and giving me a chance to clarify my understanding about random classifiers. May I know if the random classifier also tells us anything about the worst performance one can achieve in an 'n' class problem. Suppose n=2 for the sake of simplicity,and the data is equibalanced, then does a random classifier's performance tells us that the performance of any other classifier on this data cant be less than 50%. If not how is it used to assess the quality of any classifier in case of balanced and unbalanced data both? I hope the question is clear enough to respond,if not kindly let me know. Thanks!0