🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"What does cross-validation do with models on each subset?"

User: "DrGary"
New Altair Community Member
Updated by Jocelyn

Cross-validation is a technique primarily for performance estimation: it allows the training set to also be used as an independent test set. Cross-validation can also be used to prevent overfitting by stopping training when the performance on the left-out set begins to suffer.

How does the XValidation operator work in RapidMiner with respect to the models? Is a new, independent model trained for each subset? Or is it assumed that models used with the XValidation operator allow incremental training, so that each new iteration updates the same model?

If the former, then the resulting model is not trained on the whole dataset, but only one of the XVal iterations, so n-1 subsets.

If the latter, then the model is retrained on n-1 duplicates of every datapoint. To see this, consider a 3-fold cross-validation:

                  subset 1          subset 2            subset 3
iteration 1:      test              train                train
iteration 2:      train            test                train
iteration 3:      train            train                test


So the model would see subset 1 twice, subset 2 twice and subset 3 twice.

Finally, I haven't seen any documentation that XValidation is used to prevent overfitting. Can someone confirm?

Thanks,
Gary

Find more posts tagged with