"What does cross-validation do with models on each subset?"
Cross-validation is a technique primarily for performance estimation: it allows the training set to also be used as an independent test set. Cross-validation can also be used to prevent overfitting by stopping training when the performance on the left-out set begins to suffer.
How does the XValidation operator work in RapidMiner with respect to the models? Is a new, independent model trained for each subset? Or is it assumed that models used with the XValidation operator allow incremental training, so that each new iteration updates the same model?
If the former, then the resulting model is not trained on the whole dataset, but only one of the XVal iterations, so n-1 subsets.
If the latter, then the model is retrained on n-1 duplicates of every datapoint. To see this, consider a 3-fold cross-validation:
subset 1 subset 2 subset 3
iteration 1: test train train
iteration 2: train test train
iteration 3: train train test
So the model would see subset 1 twice, subset 2 twice and subset 3 twice.
Finally, I haven't seen any documentation that XValidation is used to prevent overfitting. Can someone confirm?
Thanks,
Gary