Cross Validation operator model output

varunm1
New Altair Community Member
Hello,
Trying to understand the model output of the cross-validation operator. For example, in a 5 fold cross validation 5 models are trained and tested, so does the cross-validation operator outputs last folds model or the best model from 5?
@mschmitz @lionelderkrikor
Thanks,
Varun
Trying to understand the model output of the cross-validation operator. For example, in a 5 fold cross validation 5 models are trained and tested, so does the cross-validation operator outputs last folds model or the best model from 5?
@mschmitz @lionelderkrikor
Thanks,
Varun
Tagged:
0
Best Answers
-
Hi @varunm1,The model is built on the complete input data. This is just a convenience feature. There is simply no best model and the whole point of the cross validation is to estimate how well a model trained on the full data will perform (so the validation of it, not the model selection). Search here on the community if you want to learn more about, there have been a couple of discussions already in the past, e.g. this one here: https://community.rapidminer.com/discussion/53798/cross-validationHope that helps,Ingo2
-
Hi @varunm1,
If I good remember, for a N-fold CV, RapidMiner performs N+1 iterations.
So in practice, in your case, for a 5-folds CV, 5 models are trained and tested to obtain the average performance.
Then a 6th iteration is performed and a model is builded from the whole training set. It is this model which is supplied by the model output port of CV operator and it is this model which is associated to the confusion matrix of the Performance operator.
To convince you, you can set a breakpoint after in the model inside the CV operator :
Hope this helps,
Regards,
Lionel
NB : Thanks to the experts to correct me if I'm wrong in my explanation...
1
Answers
-
Hi @varunm1,The model is built on the complete input data. This is just a convenience feature. There is simply no best model and the whole point of the cross validation is to estimate how well a model trained on the full data will perform (so the validation of it, not the model selection). Search here on the community if you want to learn more about, there have been a couple of discussions already in the past, e.g. this one here: https://community.rapidminer.com/discussion/53798/cross-validationHope that helps,Ingo2
-
Hi @varunm1,
If I good remember, for a N-fold CV, RapidMiner performs N+1 iterations.
So in practice, in your case, for a 5-folds CV, 5 models are trained and tested to obtain the average performance.
Then a 6th iteration is performed and a model is builded from the whole training set. It is this model which is supplied by the model output port of CV operator and it is this model which is associated to the confusion matrix of the Performance operator.
To convince you, you can set a breakpoint after in the model inside the CV operator :
Hope this helps,
Regards,
Lionel
NB : Thanks to the experts to correct me if I'm wrong in my explanation...
1 -
Thanks @lionelderkrikor. I tried that and got it.2