Show prevalence of largest class in Performance (Classification) and similar operators

New Altair Community Member

Nov 9, 2020

Updated Nov 5, 2024 by Jocelyn

When doing classification tasks, I normally use the prevalence (frequency) of the largest (modal) class as the naïve benchmark against which to compare if a single model is useful or not. For example, if my label is binary yes and no, with yes comprising 9% of the dataset and no comprising 91%, then I would expect the accuracy of a model to be at least 91%. If not, the model is no better than naively assigning all predictions to the larger class. The same logic applies for multiple categories (e.g. three or four classes for prediction). For example, if there were three classes A, B and C distributed 30%, 40% and 30%, then the prevalence of the largest class (B) would be 40%.

My request is that the Performance (Classification) and Performance (Binominal Classification) operators would add this as an option for criteria that they output. I am not sure, but I think the formal name for this measure is "prevalence of largest class" (c.f. https://en.wikipedia.org/wiki/Prevalence and https://en.wikipedia.org/wiki/Confusion_matrix#Table_of_confusion. Because the calculation is so simple, I hope it would be easy to implement. Yet having this handy as an output option would be more convenient than pulling out a calculator each time, which is what I have to do now.

Find more posts tagged with

AI Studio

Feature Request

🎉Community Raffle - Win $25

Show prevalence of largest class in Performance (Classification) and similar operators

Find more posts tagged with

Quick Links