Feature Selection within CV: Which features are finally selected?
Dear All,
Coming back to a topic that was attempted to be answered in the past, but as far as I'm concerned I didn't got a clear answer. Lets consider that we have 20 features A1, A2, A3,... A20 and we perform LASSO (optimizing lambda, and having alpha=1) with a LogReg model, and we do that according to the suggested best practices to reduce accidental exposure of the labels, within a K-fold CV operator. This is done K+1 times, K times for each individual fold and 1 time considering the total data set (that means that there is no data splitting into train+test in that case). And lets assume that for each fold the features with non-zero coefficients are different (A1, A3 and A5 for K=1, A2, A3, A20, for K=2, .... A5, A12, A15 for the whole data set). The final model is using the features that were selected when considering the total data set? If yes then this model performance is not corresponding to the output of the CV operator that averages the performance across all folds. Is that correct?
Many thanks in advance,
Nikos