"Weka vs RapidMiner Feature Selection"
hgwelec
New Altair Community Member
Hello,
I was wondering how can one use RM for performing cross validated feature selection *without* the use of a learning method for evaluating the worth of the subset of attributes. For example in WEKA, one can use a cross-validated GainRatio Attribute Evaluator with a Ranker search method but without the use of any classifier. Is this setting possible in RM?
Many Thanks,
Harry
I was wondering how can one use RM for performing cross validated feature selection *without* the use of a learning method for evaluating the worth of the subset of attributes. For example in WEKA, one can use a cross-validated GainRatio Attribute Evaluator with a Ranker search method but without the use of any classifier. Is this setting possible in RM?
Many Thanks,
Harry
0
Answers
-
Hi,
Probably yes. But I am not sure if I fully understand what such a process would do. If you could explain the complete validation process in detail we maybe can explain how this can be achieved (if possible) with RapidMiner.
I was wondering how can one use RM for performing cross validated feature selection *without* the use of a learning method for evaluating the worth of the subset of attributes. For example in WEKA, one can use a cross-validated GainRatio Attribute Evaluator with a Ranker search method but without the use of any classifier. Is this setting possible in RM?
Cheers,
Ingo0 -
Hello Ingo,
With weka one can choose to have a subset attribute evaluator (say CfsSubsetEval) used with a Best-First search method and the attribute selection can be made by
1) using the whole training set
2) Using cross-validation
I don't know if i made it clear enough, otherwise if you have WEKA available, you can check this setting on the "Select Attributes" tab of the Weka Explorer.
Could you tell me if such a setup can be implemented on Rapid Miner? I think not because all Validation operators accept only Model as one of their inputs....
Thanks Again,
Harry0 -
Hello again Ingo!
I just found this post :
http://lifeanalytics.blogspot.com/2008/10/sowhats-important.html
Which shows that feature selection is performed with 10-cross validation. One of the figures show the "goodness" of each attribute by the number of times it is chosen in each fold.
Harry0 -
Hi,
I unfortunately do not quite understand what you specifically are intending to do, hence I can not say whether this is possible in RapidMiner. However, the general feature selection procedure works as the following. Each feature selection operator needs inner operators that are given an example set and must return a performance vector. How this performance vector is created (by cross validation, attribute set evaluators such as [tt]CFSFeatureSetEvaluator[/tt], etc.) does not matter for the feature selection. The search method can be specified by using the appropriate feature selection operator (e.g. [tt]FeatureSelection[/tt] for forward/backward selection, [tt]GeneticAlgorithm[/tt] for an evolutionary search, [tt]BruteForce[/tt] for an exhaustive search, etc.).
Hope that helps. Otherwise please explain exactly what you intend to do.
Regards,
Tobias0 -
Hello Tobias,
First of all i am quite new to Data Mining so i apologize if my questions seem vague. I will try to do my best to explain what i am after
To move on to the problem : I am a WEKA user that found RM to be more versatile to work with...however i am trying now to do tasks(such as Feature Selection) that i used doing with WEKA.
Weka performs feature selection through Wrapper Approaches (using a classifier to evaluate the worth of Feature Selection) or Filter approaches. I am interested on the Filter Approaches and not the Wrapper Methods.
Weka is able to cross-validate (ie using 10-fold cross validation) a feature subset found by CfsSubsetEvaluator and by using Best First Forward Selection. The results for the IRIS dataset are as follows :
=== Run information ===
Evaluator: weka.attributeSelection.CfsSubsetEval
Search: weka.attributeSelection.BestFirst -D 1 -N 5
Relation: iris
Instances: 150
Attributes: 5
sepallength
sepalwidth
petallength
petalwidth
class
Evaluation mode: 10-fold cross-validation
=== Attribute selection 10 fold cross-validation (stratified), seed: 1 ===
number of folds (%) attribute
0( 0 %) 1 sepallength
0( 0 %) 2 sepalwidth
10(100 %) 3 petallength
10(100 %) 4 petalwidth
(You can find the experiment setup of WEKA,attached)
Please notice that the results show that :
1) Evaluation Mode was 10-fold Cross Validation (the type of Evaluation that i want to do in Rapid Miner)
2) petallength and petalwidth are present in all 10-folds (that means i think that those 2 features have more predictive value than sepallength and sepalwidth)
So....can RM perform such an analysis? To be even more specific, can we evaluate the following FS process setup with cross-validation???
<operator name="Root" class="Process" expanded="yes">
<operator name="CSVExampleSource" class="CSVExampleSource">
<parameter key="filename" value="D:\MyDocuments\DataMining\TrainingFiles\mydata.csv"/>
<parameter key="label_name" value="class"/>
</operator>
<operator name="FeatureSelection" class="FeatureSelection" expanded="yes">
<operator name="CFSFeatureSetEvaluator" class="CFSFeatureSetEvaluator">
<parameter key="keep_example_set" value="true"/>
</operator>
</operator>
</operator>
I hope my description was more clear now....again Many Thanks for your help!
Harry
[attachment deleted by admin]0