Hi!
I am experimenting with the text categorization on the
Reuters dataset. In this dataset, each text might belong to several categories or not belong to a category at all. So I made a binary classifier for each category which predicts whether a text belongs to the given category (positive cases) or not (negative cases).
Problem is that some categories contain very few positive cases. So, if I measure performance of a classifier, there might be a few or even none positive examples in the testing set. Currently I use BinominalClassificationPerformance to measure performance and for some categories I get "unknown" values as a measure, for example:
precision: unknown (positive class: barley_pos)
ConfusionMatrix:
True: barley_neg barley_pos
barley_neg: 458 1
barley_pos: 0 0
recall: 0.00% (positive class: barley_pos)
ConfusionMatrix:
True: barley_neg barley_pos
barley_neg: 458 1
barley_pos: 0 0
If I look at the confusion matrix above, I see that all the negative cases were predicted correctly as negative, so I am not sure that I can agree with the results showing poor or unknown performance. The question is, how to correctly measure performance in such cases?