Best method for validating results from a feature selection?
Hi all,
So here's my question. I have run a forward and backward feature selection algorithms to strip a dataset of 27,580 attributes down to ~100 that looks to be able to classify my data into 2 categories very well. These selections were wrapped in a WrapperXValidation algorithm, so I have an estimate as to their performance. I now want to test the predictive power of these features...but I do not have a test set at my disposal to do so with. I have been creating a table with only the ~100 features selected by the selection processes and running a simple XValidation on that data, using a leave one out strategy. A statistician told me I should do a 70/30 split on my data, and cross validate that way, but that really limits the amount of samples I can use for training/test sets (only 40 samples). What is the best strategy for cross validating a predictive signature without a true test set?
Here's the basic methodology I went through:
1) Extract features from dataset using forward selection within a WrapperXValidation. (Leave one out strategy)
2) Create new example set based on features selected in step 1, run a backwards selection on the subtable, wrapped within a WrapperXValidation (Leave one out strategy)
3) Create final example set based on final selected features from steps 1 and 2, run SVM wrapped in an XValidation operator (Leave one out strategy).
Thanks,
Roberto
So here's my question. I have run a forward and backward feature selection algorithms to strip a dataset of 27,580 attributes down to ~100 that looks to be able to classify my data into 2 categories very well. These selections were wrapped in a WrapperXValidation algorithm, so I have an estimate as to their performance. I now want to test the predictive power of these features...but I do not have a test set at my disposal to do so with. I have been creating a table with only the ~100 features selected by the selection processes and running a simple XValidation on that data, using a leave one out strategy. A statistician told me I should do a 70/30 split on my data, and cross validate that way, but that really limits the amount of samples I can use for training/test sets (only 40 samples). What is the best strategy for cross validating a predictive signature without a true test set?
Here's the basic methodology I went through:
1) Extract features from dataset using forward selection within a WrapperXValidation. (Leave one out strategy)
2) Create new example set based on features selected in step 1, run a backwards selection on the subtable, wrapped within a WrapperXValidation (Leave one out strategy)
3) Create final example set based on final selected features from steps 1 and 2, run SVM wrapped in an XValidation operator (Leave one out strategy).
Thanks,
Roberto