"How to get the attribute relevance using SVM"

nharste
New Altair Community Member
Hello,
I have a SVM Model (LibSVM because of multiclass problem) (included in X-Validation) and I want to know, which attributes are most relevant. I already did a grid search to find out the best parameter combination of gamma and c.
Can I get the attribute relevance with a RBF Kernel or do I have to use a Linear Kernel?
How can I get the information and where would i have to put an additional operator?
The process code is attached. Thanks for your help
I have a SVM Model (LibSVM because of multiclass problem) (included in X-Validation) and I want to know, which attributes are most relevant. I already did a grid search to find out the best parameter combination of gamma and c.
Can I get the attribute relevance with a RBF Kernel or do I have to use a Linear Kernel?
How can I get the information and where would i have to put an additional operator?
The process code is attached. Thanks for your help
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="5.3.015" expanded="true" height="60" name="Retrieve Master Excelliste_Gefügebezeichnung_3 klassen" width="90" x="45" y="30">
<parameter key="repository_entry" value="../../Data/Master Excelliste_Gefügebezeichnung_3 klassen"/>
</operator>
<operator activated="true" class="normalize" compatibility="5.3.015" expanded="true" height="94" name="Normalize" width="90" x="179" y="30">
<parameter key="method" value="range transformation"/>
<parameter key="min" value="-1.0"/>
</operator>
<operator activated="true" class="split_validation" compatibility="5.3.015" expanded="true" height="112" name="Validation" width="90" x="380" y="30">
<parameter key="split_ratio" value="0.8"/>
<parameter key="sampling_type" value="stratified sampling"/>
<process expanded="true">
<operator activated="true" class="support_vector_machine_libsvm" compatibility="5.3.015" expanded="true" height="76" name="SVM" width="90" x="150" y="30">
<parameter key="gamma" value="0.03087"/>
<parameter key="C" value="898910.0"/>
<list key="class_weights"/>
</operator>
<connect from_port="training" to_op="SVM" to_port="training set"/>
<connect from_op="SVM" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="5.3.015" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" compatibility="5.3.015" expanded="true" height="76" name="Performance" width="90" x="248" y="30"/>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve Master Excelliste_Gefügebezeichnung_3 klassen" from_port="output" to_op="Normalize" to_port="example set input"/>
<connect from_op="Normalize" from_port="example set output" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="model" to_port="result 1"/>
<connect from_op="Validation" from_port="averagable 1" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
0
Answers
-
LibSVM is doing internally something like One vs All for classification and afterwards it is combining the models.
What kind of relevance information are you searching for? The mean weights?0 -
Thanks for response!
I want to get a table where i can see the relevance of each attribute for that specific classification (for example scled between 0 and 1).
I want to discuss which attribute (in my case the attributes represent different measure methods) have a great influece on the classification.0 -
I am not aware of a way to do it with a radial SVM.
For a Linear SVM you can use the Polynominal to Binomal Classificator and determine the weights for each class separatly.
Furthermore there are severaloperators providing a Weight By SVM.
1. Weight by SVM in Rapidminer Core
2. W-SVMAttributeEval in WEKA
3. Select by Recursive Feature Elimination with SVM (part of feature selection extension)
But all of them are using linear SVMs.
Alternativly you can do a Forward Selection with your SVM inside. That way can produce a ranking of your attributes.
In your case it would be a additional idea to look at the "Weight by ..." operators. If you want to rank you measurement methods you might have a look at the tree importance or so.
Edit: An additional idea came to my mind. You could do n-1 machines (machines which are trained on all attributes but one) and look at the decrease of your performance value (accuracy,AUC,...). Than you can use this decrease as an feature (un)imporatance.0 -
How would I combine the Polynominal to Binominal Classf. with X-Validation?
- Poly by ...
-X-Validation
-SVM
-apply model, performance
Is that right?
As a result i get 3 Weight Tables (due to 3 classes): 1 vs all; 2 vs. all and 3 vs. all.
Each containing the attributes and theirs weights. When i got it right, the weight represent the direction on the hyperplane (vector) and tells you how important a attribute is in relation to the others. Does this mean that the greater the value, positive or negative, the greater the influence?
Edit: When i use the Polynominal to Binominal Classf. Operator i can only get the model as an output, but not the performance from the x-validation (see code)
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="5.3.015" expanded="true" height="60" name="Retrieve Master Excelliste_Gefügebezeichnung_3 klassen" width="90" x="45" y="30">
<parameter key="repository_entry" value="../../Data/Master Excelliste_Gefügebezeichnung_3 klassen"/>
</operator>
<operator activated="true" class="normalize" compatibility="5.3.015" expanded="true" height="94" name="Normalize" width="90" x="179" y="30">
<parameter key="method" value="range transformation"/>
<parameter key="min" value="-1.0"/>
</operator>
<operator activated="true" class="polynomial_by_binomial_classification" compatibility="5.3.015" expanded="true" height="76" name="Polynominal by Binominal Classification" width="90" x="380" y="75">
<process expanded="true">
<operator activated="true" class="split_validation" compatibility="5.3.015" expanded="true" height="112" name="Validation" width="90" x="380" y="30">
<parameter key="split_ratio" value="0.8"/>
<parameter key="sampling_type" value="stratified sampling"/>
<process expanded="true">
<operator activated="true" class="support_vector_machine" compatibility="5.3.015" expanded="true" height="112" name="SVM (2)" width="90" x="45" y="30">
<parameter key="kernel_type" value="radial"/>
<parameter key="kernel_gamma" value="0.03"/>
<parameter key="C" value="380000.0"/>
</operator>
<connect from_port="training" to_op="SVM (2)" to_port="training set"/>
<connect from_op="SVM (2)" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="5.3.015" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" compatibility="5.3.015" expanded="true" height="76" name="Performance" width="90" x="248" y="30"/>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<connect from_port="training set" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve Master Excelliste_Gefügebezeichnung_3 klassen" from_port="output" to_op="Normalize" to_port="example set input"/>
<connect from_op="Normalize" from_port="example set output" to_op="Polynominal by Binominal Classification" to_port="training set"/>
<connect from_op="Polynominal by Binominal Classification" from_port="model" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0 -
Hi again,
in my eyes the one vs all operator should be inside the cross validation, because i am building a "meta model" which needs to be validated.
i build a process which collects the indivudal weight vectors on Iris, this should work on your data as well. The XML is attached.
After doing so I realized two things
1. You are doing a split validation with 0.8 as percentage. That means you second SVM is trained on 20%. I would definitly swich to a X-Validation (maybe with 2 folds).
2. The weight vectors are for a linear SVM the same as the vectors in the Model. If you click on "Model description" you get the weights. This works for a radial SVM as well. That should solve your problem.<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.1.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Generate Empty Weight vector">
<process expanded="true">
<operator activated="true" class="subprocess" compatibility="6.1.000" expanded="true" height="76" name="Subprocess" width="90" x="45" y="300">
<process expanded="true">
<operator activated="true" class="generate_data" compatibility="6.1.000" expanded="true" height="60" name="Generate Data" width="90" x="112" y="30">
<parameter key="number_examples" value="1"/>
<parameter key="number_of_attributes" value="1"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="6.1.000" expanded="true" height="76" name="Select Attributes" width="90" x="246" y="30">
<parameter key="invert_selection" value="true"/>
</operator>
<operator activated="true" class="weight_by_user_specification" compatibility="6.1.000" expanded="true" height="76" name="Weight by User Specification" width="90" x="380" y="30">
<list key="name_regex_to_weights"/>
</operator>
<operator activated="true" class="collect" compatibility="6.1.000" expanded="true" height="76" name="Collect (2)" width="90" x="514" y="30"/>
<connect from_op="Generate Data" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Weight by User Specification" to_port="example set"/>
<connect from_op="Weight by User Specification" from_port="weights" to_op="Collect (2)" to_port="input 1"/>
<connect from_op="Collect (2)" from_port="collection" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="remember" compatibility="6.1.000" expanded="true" height="60" name="Remember" width="90" x="179" y="300">
<parameter key="name" value="Weights"/>
<parameter key="io_object" value="IOObjectCollection"/>
</operator>
<operator activated="false" class="retrieve" compatibility="6.1.000" expanded="true" height="60" name="Retrieve Master Excelliste_Gefügebezeichnung_3 klassen" width="90" x="45" y="120">
<parameter key="repository_entry" value="../../Data/Master Excelliste_Gefügebezeichnung_3 klassen"/>
</operator>
<operator activated="true" class="retrieve" compatibility="6.1.000" expanded="true" height="60" name="Retrieve Iris" width="90" x="45" y="30">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="normalize" compatibility="6.1.000" expanded="true" height="94" name="Normalize" width="90" x="246" y="30">
<parameter key="method" value="range transformation"/>
<parameter key="min" value="-1.0"/>
</operator>
<operator activated="true" class="split_validation" compatibility="6.1.000" expanded="true" height="112" name="Validation" width="90" x="581" y="30">
<parameter key="split_ratio" value="0.8"/>
<parameter key="sampling_type" value="stratified sampling"/>
<process expanded="true">
<operator activated="true" class="polynomial_by_binomial_classification" compatibility="6.1.000" expanded="true" height="76" name="Polynominal by Binominal Classification" width="90" x="112" y="30">
<process expanded="true">
<operator activated="true" class="support_vector_machine" compatibility="6.1.000" expanded="true" height="112" name="SVM (2)" width="90" x="246" y="30">
<parameter key="kernel_gamma" value="0.03"/>
<parameter key="C" value="380000.0"/>
</operator>
<operator activated="true" class="recall" compatibility="6.1.000" expanded="true" height="60" name="Recall" width="90" x="112" y="255">
<parameter key="name" value="Weights"/>
<parameter key="io_object" value="IOObjectCollection"/>
</operator>
<operator activated="true" class="flatten_collection" compatibility="6.1.000" expanded="true" height="60" name="Flatten Collection" width="90" x="246" y="255"/>
<operator activated="true" class="collect" compatibility="6.1.000" expanded="true" height="94" name="Collect" width="90" x="447" y="165"/>
<operator activated="true" class="remember" compatibility="6.1.000" expanded="true" height="60" name="Remember (2)" width="90" x="581" y="165">
<parameter key="name" value="Weights"/>
<parameter key="io_object" value="IOObjectCollection"/>
</operator>
<connect from_port="training set" to_op="SVM (2)" to_port="training set"/>
<connect from_op="SVM (2)" from_port="model" to_port="model"/>
<connect from_op="SVM (2)" from_port="weights" to_op="Collect" to_port="input 2"/>
<connect from_op="Recall" from_port="result" to_op="Flatten Collection" to_port="collection"/>
<connect from_op="Flatten Collection" from_port="flat" to_op="Collect" to_port="input 1"/>
<connect from_op="Collect" from_port="collection" to_op="Remember (2)" to_port="store"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
</process>
</operator>
<connect from_port="training" to_op="Polynominal by Binominal Classification" to_port="training set"/>
<connect from_op="Polynominal by Binominal Classification" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="6.1.000" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" compatibility="6.1.000" expanded="true" height="76" name="Performance" width="90" x="248" y="30"/>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="recall" compatibility="6.1.000" expanded="true" height="60" name="Recall (2)" width="90" x="715" y="165">
<parameter key="name" value="Weights"/>
<parameter key="io_object" value="IOObjectCollection"/>
</operator>
<operator activated="true" class="flatten_collection" compatibility="6.1.000" expanded="true" height="60" name="Flatten Collection (2)" width="90" x="849" y="165"/>
<connect from_op="Subprocess" from_port="out 1" to_op="Remember" to_port="store"/>
<connect from_op="Retrieve Iris" from_port="output" to_op="Normalize" to_port="example set input"/>
<connect from_op="Normalize" from_port="example set output" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="model" to_port="result 1"/>
<connect from_op="Recall (2)" from_port="result" to_op="Flatten Collection (2)" to_port="collection"/>
<connect from_op="Flatten Collection (2)" from_port="flat" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>0