Best method for validating results from a feature selection?

User: "Roberto"
New Altair Community Member
Updated by Jocelyn
Hi all,

So here's my question.  I have run a forward and backward feature selection algorithms to strip a dataset of 27,580 attributes down to ~100 that looks to be able to classify my data into 2 categories very well.  These selections were wrapped in a WrapperXValidation algorithm, so I have an estimate as to their performance.  I now want to test the predictive power of these features...but I do not have a test set at my disposal to do so with.  I have been creating a table with only the ~100 features selected by the selection processes and running a simple XValidation on that data, using a leave one out strategy.  A statistician told me I should do a 70/30 split on my data, and cross validate that way, but that really limits the amount of samples I can use for training/test sets (only 40 samples).  What is the best strategy for cross validating a predictive signature without a true test set?

Here's the basic methodology I went through:

1)  Extract features from dataset using forward selection within a WrapperXValidation.  (Leave one out strategy)
2)  Create new example set based on features selected in step 1, run a backwards selection on the subtable, wrapped within a WrapperXValidation (Leave one out strategy)
3)  Create final example set based on final selected features from steps 1 and 2, run SVM wrapped in an XValidation operator (Leave one out strategy).

Thanks,
Roberto

Find more posts tagged with