Variable importance measure
Hi,
in general, the learner "random forests" provides an algorithm to
measure the importance of the predicted variables and works as
follows:
"Variable importance: This is a difficult concept to
define in general, because the importance of a
variable may be due to its (possibly complex)
interaction with other variables. The random
forest algorithm estimates the importance of a
variable by looking at how much prediction error
increases when OOB (out-of-bag) data for that variable
is permuted while all others are left unchanged.
The necessary calculations are carried
out tree by tree as the random forest is
constructed."
It's described in Breiman's (developer of random forests)
paper [1] and is for example implemented in the GNU R
randomForest package. GNU R determines for each variable
its Gini index that indicate how important that variable
is for the classification. It's a very nice feature and
the results can be drawn in a bar diagram.
Is this "variable importance measure" also possible in
RapidMiner's RandomForest? I couldn't find it anywhere.
Or can the variable importance be estimated in a different
way using RapidMiner?
Thank you for your help.
Best regards,
Paul
[1] L. Breiman. Manual on setting up, using and understanding
random forests.
in general, the learner "random forests" provides an algorithm to
measure the importance of the predicted variables and works as
follows:
"Variable importance: This is a difficult concept to
define in general, because the importance of a
variable may be due to its (possibly complex)
interaction with other variables. The random
forest algorithm estimates the importance of a
variable by looking at how much prediction error
increases when OOB (out-of-bag) data for that variable
is permuted while all others are left unchanged.
The necessary calculations are carried
out tree by tree as the random forest is
constructed."
It's described in Breiman's (developer of random forests)
paper [1] and is for example implemented in the GNU R
randomForest package. GNU R determines for each variable
its Gini index that indicate how important that variable
is for the classification. It's a very nice feature and
the results can be drawn in a bar diagram.
Is this "variable importance measure" also possible in
RapidMiner's RandomForest? I couldn't find it anywhere.
Or can the variable importance be estimated in a different
way using RapidMiner?
Thank you for your help.
Best regards,
Paul
[1] L. Breiman. Manual on setting up, using and understanding
random forests.