🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Show prevalence of largest class in Performance (Classification) and similar operators

User: "Tripartio"
New Altair Community Member
Updated by Jocelyn
When doing classification tasks, I normally use the prevalence (frequency) of the largest (modal) class as the naïve benchmark against which to compare if a single model is useful or not. For example, if my label is binary yes and no, with yes comprising 9% of the dataset and no comprising 91%, then I would expect the accuracy of a model to be at least 91%. If not, the model is no better than naively assigning all predictions to the larger class. The same logic applies for multiple categories (e.g. three or four classes for prediction). For example, if there were three classes A, B and C distributed 30%, 40% and 30%, then the prevalence of the largest class (B) would be 40%.

My request is that the Performance (Classification) and Performance (Binominal Classification) operators would add this as an option for criteria that they output. I am not sure, but I think the formal name for this measure is "prevalence of largest class" (c.f. https://en.wikipedia.org/wiki/Prevalence and https://en.wikipedia.org/wiki/Confusion_matrix#Table_of_confusion. Because the calculation is so simple, I hope it would be easy to implement. Yet having this handy as an output option would be more convenient than pulling out a calculator each time, which is what I have to do now.




Find more posts tagged with