Binary text classification - Help in process needed.

New Altair Community Member

Dec 1, 2016

Updated Nov 5, 2024 by Jocelyn

Hey guys,

We want to do a binary classification on a text data set with the distribution 80% negative class, 20% positive class. In order to reach maximum statistical meaningfulness, we want to do so by using 10-fold cross validation.

If we model this within Rapidminer, we are unsuccessful since it doesn’t output any statistical metrics (like precision, recall, etc):

Bildschirmfoto 2016-12-01 um 12.14.37.png

We found a workaround that works, but it doesn’t make any sense out of a ML perspective: If we first divide into training or test and then use 10-fold-crossvalidation it works — But the training or test split should be part of the crossvaligdation (9 training folds, 1 test fold, 10 iterations). So right now the only way to get this working is by FIRST dividing into test and training and THEN use X-Validation. Did we model it the right way or did we miss anything?

Bildschirmfoto 2016-12-01 um 12.14.37.png