[SOLVED] Using Weight by SVM for feature selection...
kasper2304
New Altair Community Member
Hi out there.
I am working on optimizing a model that is supposed to solve a binary text classification problem. My data is highly unbalanced with 4% positive cases. I am using linear SVM together with the "Optimize parameters" node and "X-Validation". I optimize the "C" parameter with the optimization node. I have extracted around 150-220 features using n-grams, stopword removal, and porter stemming and my term weighting is "Binary".
Until yesterday i did not do anything to reduce dimensionality/feature as i read a couple of places this was not really necessary for SVM's. However i did earlier improve performance by PCA and ICA leading me to think i at least should try out reducing the number of features. Of course i removed names, and domain specific terms as these does not make any sense to keep in, in my case.
I cant really figure out how to make PCA and ICA work properly in rapidminer and then i stumbled upon "Weights by SVM" and thought by myself that it makes good sense to kick out features with low weights. This should a least make my algorithm faster and hopefully also improve my model performance. So i attached the "Weight by SVM" node to my "Select attribute" node (The one kicking out names and domain specific terms) and then i attached a "Select by weights" node and did a couple of runs. At first i filtered out 50% of the lowest weighted features, immediately giving me far better results than i ever had and when i adjusted the filter to kick out 80% of the lowest weighted attributes, I had better results than i ever would have dared dreaming about.
To be precise i had my model perform a class recall(1) of 75%, a class recall(0) of 98,96%, a class precision(1) of 75%, a class precision(0) of 98,96 and an overall accuracy of 98%. These results i consider fairly good for my training set...?
As i have a relatively low number of positive cases there is a high chance that my model may assign weights to domain specific terms, making my model unable to be good at generalizing, and therefore i have started kicking out variables that i do not think is good generalizers of my problem which of course reduces the performance of my model. However i am willing to live with that and i might actually prove to be the best solution in the end
My question to you guys is then... (In prioritized order)
1 )Did i do something criminal by using SVM weights for feature selection?
2) How does it make sense to use SVM weights before running an actual SVM algorithm?
Best
Kasper
I am working on optimizing a model that is supposed to solve a binary text classification problem. My data is highly unbalanced with 4% positive cases. I am using linear SVM together with the "Optimize parameters" node and "X-Validation". I optimize the "C" parameter with the optimization node. I have extracted around 150-220 features using n-grams, stopword removal, and porter stemming and my term weighting is "Binary".
Until yesterday i did not do anything to reduce dimensionality/feature as i read a couple of places this was not really necessary for SVM's. However i did earlier improve performance by PCA and ICA leading me to think i at least should try out reducing the number of features. Of course i removed names, and domain specific terms as these does not make any sense to keep in, in my case.
I cant really figure out how to make PCA and ICA work properly in rapidminer and then i stumbled upon "Weights by SVM" and thought by myself that it makes good sense to kick out features with low weights. This should a least make my algorithm faster and hopefully also improve my model performance. So i attached the "Weight by SVM" node to my "Select attribute" node (The one kicking out names and domain specific terms) and then i attached a "Select by weights" node and did a couple of runs. At first i filtered out 50% of the lowest weighted features, immediately giving me far better results than i ever had and when i adjusted the filter to kick out 80% of the lowest weighted attributes, I had better results than i ever would have dared dreaming about.
To be precise i had my model perform a class recall(1) of 75%, a class recall(0) of 98,96%, a class precision(1) of 75%, a class precision(0) of 98,96 and an overall accuracy of 98%. These results i consider fairly good for my training set...?
As i have a relatively low number of positive cases there is a high chance that my model may assign weights to domain specific terms, making my model unable to be good at generalizing, and therefore i have started kicking out variables that i do not think is good generalizers of my problem which of course reduces the performance of my model. However i am willing to live with that and i might actually prove to be the best solution in the end
My question to you guys is then... (In prioritized order)
1 )Did i do something criminal by using SVM weights for feature selection?
2) How does it make sense to use SVM weights before running an actual SVM algorithm?
Best
Kasper
Tagged:
0
Answers
-
(ANSWERING MY OWN QUESTION)
At first i could not find any literature about this method making me think that i overlooked something. Since i did find literature doing exactly this with fairly good results. I did a quick feature selection by information gain and information gain ratio and feature selection by linear SVM is still by far the best, both when it comes to classification accuracy and robustness. In the range of 35% to 50% top weighted features i have it performing equally good. I assessed it visually by SOM which have me a good intuition about what it did.0