[SOLVED] Using Weight by SVM for feature selection...
Hi out there.
I am working on optimizing a model that is supposed to solve a binary text classification problem. My data is highly unbalanced with 4% positive cases. I am using linear SVM together with the "Optimize parameters" node and "X-Validation". I optimize the "C" parameter with the optimization node. I have extracted around 150-220 features using n-grams, stopword removal, and porter stemming and my term weighting is "Binary".
Until yesterday i did not do anything to reduce dimensionality/feature as i read a couple of places this was not really necessary for SVM's. However i did earlier improve performance by PCA and ICA leading me to think i at least should try out reducing the number of features. Of course i removed names, and domain specific terms as these does not make any sense to keep in, in my case.
I cant really figure out how to make PCA and ICA work properly in rapidminer and then i stumbled upon "Weights by SVM" and thought by myself that it makes good sense to kick out features with low weights. This should a least make my algorithm faster and hopefully also improve my model performance. So i attached the "Weight by SVM" node to my "Select attribute" node (The one kicking out names and domain specific terms) and then i attached a "Select by weights" node and did a couple of runs. At first i filtered out 50% of the lowest weighted features, immediately giving me far better results than i ever had and when i adjusted the filter to kick out 80% of the lowest weighted attributes, I had better results than i ever would have dared dreaming about.
To be precise i had my model perform a class recall(1) of 75%, a class recall(0) of 98,96%, a class precision(1) of 75%, a class precision(0) of 98,96 and an overall accuracy of 98%. These results i consider fairly good for my training set...?
As i have a relatively low number of positive cases there is a high chance that my model may assign weights to domain specific terms, making my model unable to be good at generalizing, and therefore i have started kicking out variables that i do not think is good generalizers of my problem which of course reduces the performance of my model. However i am willing to live with that and i might actually prove to be the best solution in the end
My question to you guys is then... (In prioritized order)
1 )Did i do something criminal by using SVM weights for feature selection?
2) How does it make sense to use SVM weights before running an actual SVM algorithm?
Best
Kasper