"Problem with Feature Selection"
vitalimario
New Altair Community Member
Hi,
I am trying to perform Feature Selection, then apply the reduced subset of features to a J48 tree and finally present the tree to the output.
As you will see i have the 'create_complete_model option' checked
Unfortunately *all* tutorials of feature selection in RM use either SVM or NearestNeigbours models which both do not return an output!
Here is my setting :
<operator name="Root" class="Process" expanded="yes">
<operator name="CSVExampleSource" class="CSVExampleSource">
<parameter key="filename" value="c::\MyDocuments\score.csv"/>
<parameter key="label_name" value="class"/>
</operator>
<operator name="FeatureSelection" class="FeatureSelection" expanded="yes">
<parameter key="selection_direction" value="backward"/>
<operator name="SimpleValidation" class="SimpleValidation" expanded="yes">
<parameter key="create_complete_model" value="true"/>
<operator name="W-J48" class="W-J48">
</operator>
<operator name="ApplierChain" class="OperatorChain" expanded="yes">
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
<operator name="Performance" class="Performance">
</operator>
</operator>
</operator>
<operator name="ProcessLog" class="ProcessLog">
<list key="log">
<parameter key="Generation" value="operator.LibSVMLearner.value.applycount"/>
</list>
</operator>
</operator>
</operator>
What am i doing wrong ? ???
I am trying to perform Feature Selection, then apply the reduced subset of features to a J48 tree and finally present the tree to the output.
As you will see i have the 'create_complete_model option' checked
Unfortunately *all* tutorials of feature selection in RM use either SVM or NearestNeigbours models which both do not return an output!
Here is my setting :
<operator name="Root" class="Process" expanded="yes">
<operator name="CSVExampleSource" class="CSVExampleSource">
<parameter key="filename" value="c::\MyDocuments\score.csv"/>
<parameter key="label_name" value="class"/>
</operator>
<operator name="FeatureSelection" class="FeatureSelection" expanded="yes">
<parameter key="selection_direction" value="backward"/>
<operator name="SimpleValidation" class="SimpleValidation" expanded="yes">
<parameter key="create_complete_model" value="true"/>
<operator name="W-J48" class="W-J48">
</operator>
<operator name="ApplierChain" class="OperatorChain" expanded="yes">
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
<operator name="Performance" class="Performance">
</operator>
</operator>
</operator>
<operator name="ProcessLog" class="ProcessLog">
<list key="log">
<parameter key="Generation" value="operator.LibSVMLearner.value.applycount"/>
</list>
</operator>
</operator>
</operator>
What am i doing wrong ? ???
Tagged:
0
Answers
-
Hello
I am afraid I got you wrong, but:
The decision tree you create is used within the "performance measurement"-process to determine the (sub-)optimal set of features. If you want to create a Decision Tree with the best feature subset you must add something like this (at top level):
hope this was helpful,
<operator name="AttributeWeightSelection" class="AttributeWeightSelection">
</operator>
<operator name="W-J48 (2)" class="W-J48">
</operator>
Steffen
0 -
Hi Steffen,
Let me be more specific
Originally i would like to know whether the setup on the RM tutorial 12_WrapperValidation.xml can use a J48 decision tree (instead of JMySVMLearner and Regression Problem) and show the results (ie the Decision Tree) to the output : I changed the setup and i am not able to get the resulting Final tree (after feature selection) to the output :
Here is the final setup :
<operator name="Root" class="Process" expanded="yes">
<operator name="CSVExampleSource" class="CSVExampleSource">
<parameter key="filename" value="C:\test.csv"/>
<parameter key="label_name" value="class"/>
</operator>
<operator name="WrapperXValidation" class="WrapperXValidation" expanded="yes">
<parameter key="sampling_type" value="shuffled sampling"/>
<operator name="FeatureSelection" class="FeatureSelection" expanded="yes">
<operator name="FSXValidation" class="XValidation" expanded="yes">
<parameter key="sampling_type" value="shuffled sampling"/>
<operator name="W-J48" class="W-J48">
</operator>
<operator name="FSOperatorChain" class="OperatorChain" expanded="yes">
<operator name="FSModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
<operator name="ClassificationPerformance" class="ClassificationPerformance">
<parameter key="accuracy" value="true"/>
<list key="class_weights">
</list>
<parameter key="classification_error" value="true"/>
<parameter key="kappa" value="true"/>
<parameter key="spearman_rho" value="true"/>
<parameter key="weighted_mean_precision" value="true"/>
<parameter key="weighted_mean_recall" value="true"/>
</operator>
<operator name="FSMinMaxWrapper" class="MinMaxWrapper">
<parameter key="minimum_weight" value="0.5"/>
</operator>
</operator>
</operator>
</operator>
<operator name="W-J48 (2)" class="W-J48">
</operator>
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
<operator name="ClassificationPerformance (2)" class="ClassificationPerformance">
<parameter key="accuracy" value="true"/>
<list key="class_weights">
</list>
<parameter key="classification_error" value="true"/>
<parameter key="kappa" value="true"/>
<parameter key="weighted_mean_recall" value="true"/>
</operator>
</operator>
</operator>
</operator>
Thanks!
0 -
Hello again
Hm .. ok I think I got it now. Since there is no option to build a complete model, you cannot do it "in the same process".
You just have to repeat this part:<operator name="Root" class="Process" expanded="yes">
because this is exactly what "build complete model" does/would do.
<operator name="FeatureSelection" class="FeatureSelection" expanded="no">
<operator name="FSXValidation" class="XValidation" expanded="no">
<parameter key="sampling_type" value="shuffled sampling"/>
<operator name="W-J48" class="W-J48">
</operator>
<operator name="FSOperatorChain" class="OperatorChain" expanded="yes">
<operator name="FSModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
<operator name="ClassificationPerformance" class="ClassificationPerformance">
<parameter key="accuracy" value="true"/>
<list key="class_weights">
</list>
<parameter key="classification_error" value="true"/>
<parameter key="kappa" value="true"/>
<parameter key="spearman_rho" value="true"/>
<parameter key="weighted_mean_precision" value="true"/>
<parameter key="weighted_mean_recall" value="true"/>
</operator>
<operator name="FSMinMaxWrapper" class="MinMaxWrapper">
<parameter key="minimum_weight" value="0.5"/>
</operator>
</operator>
</operator>
</operator>
<operator name="AttributeWeightSelection" class="AttributeWeightSelection">
</operator>
<operator name="W-J48 (2)" class="W-J48">
</operator>
</operator>
hope this was helpful,
Steffen
0 -
Dear Steffen,
It worked GREAT! Thanks again0