Cross Validation operator model output

varunm1
varunm1 New Altair Community Member
edited November 2024 in Community Q&A
Hello,

Trying to understand the model output of the cross-validation operator. For example, in a 5 fold cross validation 5 models are trained and tested, so does the cross-validation operator outputs last folds model or the best model from 5?

@mschmitz @lionelderkrikor

Thanks,
Varun

Best Answers

  • IngoRM
    IngoRM New Altair Community Member
    Answer ✓
    The model is built on the complete input data.  This is just a convenience feature.  There is simply no best model and the whole point of the cross validation is to estimate how well a model trained on the full data will perform (so the validation of it, not the model selection).  Search here on the community if you want to learn more about, there have been a couple of discussions already in the past, e.g. this one here: https://community.rapidminer.com/discussion/53798/cross-validation
    Hope that helps,
    Ingo
  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Answer ✓
    Hi @varunm1,

    If I good remember, for a N-fold CV, RapidMiner performs N+1 iterations.
    So in practice, in your case, for a 5-folds CV, 5 models are trained and tested to obtain the average performance.
    Then a 6th iteration is performed and a model is builded from the whole training set. It is this model which is supplied by the model output port of CV operator and it is this model which is associated to the confusion matrix of the Performance operator.
    To convince you, you can set a breakpoint after in the model inside the CV operator : 


    Hope this helps,

    Regards,

    Lionel

    NB : Thanks to the experts to correct me if I'm wrong in my explanation...
     

Answers

  • IngoRM
    IngoRM New Altair Community Member
    Answer ✓
    The model is built on the complete input data.  This is just a convenience feature.  There is simply no best model and the whole point of the cross validation is to estimate how well a model trained on the full data will perform (so the validation of it, not the model selection).  Search here on the community if you want to learn more about, there have been a couple of discussions already in the past, e.g. this one here: https://community.rapidminer.com/discussion/53798/cross-validation
    Hope that helps,
    Ingo
  • varunm1
    varunm1 New Altair Community Member
    Thanks @IngoRM this clarifies my question.
  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Answer ✓
    Hi @varunm1,

    If I good remember, for a N-fold CV, RapidMiner performs N+1 iterations.
    So in practice, in your case, for a 5-folds CV, 5 models are trained and tested to obtain the average performance.
    Then a 6th iteration is performed and a model is builded from the whole training set. It is this model which is supplied by the model output port of CV operator and it is this model which is associated to the confusion matrix of the Performance operator.
    To convince you, you can set a breakpoint after in the model inside the CV operator : 


    Hope this helps,

    Regards,

    Lionel

    NB : Thanks to the experts to correct me if I'm wrong in my explanation...
     
  • varunm1
    varunm1 New Altair Community Member
    Thanks @lionelderkrikor. I tried that and got it.