🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

How to plot Stability and/or Accuracy versus number of features?

User: "meloamaury"
New Altair Community Member
Updated by Jocelyn

Hi all,

I would like to plot the Stability of a feature selection operator as a function of the number of features (I would like to reproduce Fig. 6 of the attached .pdf, which I believe is useful for the community). For instance, I can use the "Feature Selection Stability Validation" operator that comes with the Feature Selection Extension. Inside this operator, I could use any other feature selection operator, e.g., "MRMR-FS" or "SVM-RFE". Then I would like to plot the stability of the feature selection against the number of features. I believe, this would give me a better feeling for the number of features to keep for further processing and modelling.

The same idea could be used to plot any performance metric, or runtime, or etc, against the number of features, a sort of "Learning curve" but instead of the number of examples, we use the number of features.

 

I hope the question is clear enough and I thank you all for your input.

Merci,

Amaury

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "IngoRM"
    New Altair Community Member
    Accepted Answer

    Hi Amaury,

     

    In there you have use the Sonar data set and NB classifier. For some basic tests, I see that the results for the Pareto Front will depend on which classifier you used inside the Validation operator.

     

    That is correct.  I think this is actually something positive since the feature weighting / importance and the question if this feature should be used or not is then a good fit to the model itself.  Which typically leads to better accuracies.  This is called "wrapper approach" by the way.  If you would filter attributes out without taking the specific model into account, we call this "filter approach".  the wrapper approach in general delivers better results but needs longer runtimes for model building and validation.

     

    My problem consists of around 800 examples and 2000 attributes. I have built a process where I use a "Select Subprocess" and inside of it I have different  "Optimize Grid" operators containing different classifiers (e.g, LogReg, RandomForest, SVM etc). After this long run, I compare the ROC's for the different classifiers obtained with the the best set of parameters found by the "Optimize Grid" operators.

     

    That makes sense.  You could in theory wrap the whole model building and validation process into the MO feature selection but this might run for a long time.  An alternative is to optimize the model selection and parameter optimization on all features beforehand and then only use the best model so far inside the MO feature selection.  Or you could first filter some features out (filter approach), then optimize the model / parameters, and then run the MO FS.  There is really no right or wrong here in my opinion.  I personally use an iterative approach most of the times.  Filter some features out.  Find some good model candidates.  Optimize parameters a little bit.  Run a feature selection.  Then optimize parameters further and so on...

     

    Hope this helps,

    Ingo