Order of Performing nested K-fold cross validation

New Altair Community Member

Apr 18, 2018

Updated Nov 5, 2024 by Jocelyn

I have been looking at the following tutorial on correct model validation:

https://rapidminer.com/resource/correct-model-validation/

I'm looking at the section on contamination through feature selection when doing K-fold cross validation. In the section on Accidental Contamination, near the bottom in example 3), it is suggesting to use nested K-fold validation to search for features in a similar way to that which is being suggested in example 2) for the choice of hyperparameters.

My question is: Is there any best practice around whether to do the nested k-fold validation for feature selection first, then to use the selected features for the nested validation on the hyperparameters, or vice versa? I am imagining it will be too computationally expensive to nest all 3 techniques within one another.

Can anyone advise on this?

Thank you

Find more posts tagged with

AI Studio

Cross Validation

Sort by:

1 - 7 of 71

kypexin

New Altair Community Member

Apr 18, 2018

That's pretty great question, I would also like to see an example of proper multi-level nested validation process in case all steps are needed at once:

normalization
feature selection
parameters optimization

@mschmitz ?

Telcontar120

New Altair Community Member

Apr 18, 2018

In practice, I don't think many people are putting parameter optimization inside cross-validation. It's just too time consuming. I'd be quite comfortable with a setup where normalization and feature selection occurred within cross-validation, and then the results of that process were fed to an optimization process where cross-validation for model training was occuring inside the parameter optimization operator.

Thomas_Ott

New Altair Community Member

Apr 18, 2018

This is a great question and I remember we had this discussion elsehwere in the threads here. I agreewwith what @Telcontar120 says.

kypexin

New Altair Community Member

Apr 18, 2018

Thanks @Telcontar120 @Thomas_Ott

Though I have one really stupid question at this point, as I am a bit dumb today

If we normalize or perform feature selection within k-fold x-Validation, this is done k+1 times in total if I remember correctly from Martin's explanation somewhere else: k times (one for each fold) + one more time for full dataset, right? At the same time logic tells me that on each fold we might have slightly different normalization or feature selection?

So far, how do we pull out the preprocessing model out of x-Validation in this case? Just by taking the latest one? My concern is that the same preprocessing model should also be applied on a test set and also propagated to production process (if there's any).

Telcontar120

New Altair Community Member

Apr 18, 2018

Correct, with k-fold cross validation, there are k+1 runs, where the final run is on the entire dataset and that is the result that is returned for any model. But conceptually the cross-validation is simply a way to estimate the reliability of your results on unseen data (to avoid overfitting), and as Ingo's post has shown, when you do things like normalization and other preprocessing inside the cross-validation, you get a more realistic view of what your eventual performance would be like. But when you actually go to construct your normalization model or other preprocessing models, that should be performed using the entire dataset.

Feature selection is similar, only there is no prepocessing model that is returned, just a smaller set of attributes that will be used in the final model. And of course a predictive model itself is returned directly from the cross-validation output (once again, the one built on the entire dataset and not any of the individual k folds).

I hope this clarifies!

kypexin

New Altair Community Member

Apr 18, 2018

Thanks @Telcontar120 you have returned me my sanity this is the way I actually do, just decided to double-check because dumb day that's why

Thomas_Ott

New Altair Community Member

Apr 18, 2018

@kypexin it's a complex rabbit hole but exactly what @Telcontar120 said when it comes to k+1, it's the entire dataset with the average performances for each 'k'.

When I used to teach the RM traning course, this topic (e.g. normalizing inside the X-val) would cause my student's heads to smoke.

Order of Performing nested K-fold cross validation

Find more posts tagged with

Quick Links