Cross-validation Features
JohnNash2000
New Altair Community Member
Hello, I am currently performing cross-validation (CV), and within this process, "Forward Selection" is performed during training. How can I output the chosen features once CV has completed? I've tried countless solutions including using the "Weights to Data" and "Data to Weights" operators, but neither of these output the chosen features. Does anyone know how I can extract the chosen features from the "Cross Validation" process?
Thank you
0
Best Answer
-
Hello @JohnNash2000
When CV runs on each fold you might get different features for every iteration as data changes. Now, do you want to store attributes for each iteration? Or the final one where all data undergoes training phase of cross validation process?
If you are looking for each iteration of CV, why not use a store operator with %{execution_count} macro as name in store operator?
If you are looking for something different, please let us know and we will try to resolve it. Also attach your process so that we can take a look.5
Answers
-
Hello @JohnNash2000
When CV runs on each fold you might get different features for every iteration as data changes. Now, do you want to store attributes for each iteration? Or the final one where all data undergoes training phase of cross validation process?
If you are looking for each iteration of CV, why not use a store operator with %{execution_count} macro as name in store operator?
If you are looking for something different, please let us know and we will try to resolve it. Also attach your process so that we can take a look.5 -
Hello @varunm1You are 100% correct, there is no final set of features since each iteration of CV will have its own feature set. You see, I recently read the blog post about contamination ("Avoiding Accidental Contamination of Data [3 Examples]"), and so I moved my feature selection process from outside of CV to inside. When the feature selection process was outside, I had a chosen set of features based on the entire training data. This is what I was looking for, and I became so blinded in finding how to do this, I never stopped to think why.Thank you
1 -
Thats true @JohnNash2000 if we are validating a model, the preprocessing steps like sampling, feature selection should be applied on training side. If we apply on whole data it will bias the model and some times over estimates the performance.0