dummy coded variables
Lara
New Altair Community Member
Dear Data Mining and Rapid Miner Experts,
I would like to analyse my dataset which contains categorical (polynominal) predictor variables by Logistic Regression and SVM.
So far I used other Data Mining/ Statistic Software that have transformed my categorical predictor variables automatically by dummy coding using one group as a reference group and getting k-1 new binominal dummy coded variables (when having k groups in the considered attribute).
How can I perform this in Rapid Miner?
If I transform an attribute in k-1 binominal variables manually how will the Logistic Regression or SVM operator know that these are my dummy coded variables? Or do I just have to create k new binominal attributes for modelling...?
Thank you very much.
Lara
I would like to analyse my dataset which contains categorical (polynominal) predictor variables by Logistic Regression and SVM.
So far I used other Data Mining/ Statistic Software that have transformed my categorical predictor variables automatically by dummy coding using one group as a reference group and getting k-1 new binominal dummy coded variables (when having k groups in the considered attribute).
How can I perform this in Rapid Miner?
If I transform an attribute in k-1 binominal variables manually how will the Logistic Regression or SVM operator know that these are my dummy coded variables? Or do I just have to create k new binominal attributes for modelling...?
Thank you very much.
Lara
Tagged:
0
Answers
-
Hi Lara,
you are right: RapidMiner does not transform the data automatically but the user has to define what data should be performed in which way. The reason for that is that we believe that the user should be aware of what's happening instead of simply performing some preprocessing which might introduce a lot of bias. So we go for the "manual" way - and combine this with assistants like the new quick fixes introduced in RapidMiner 5 in order to support the user for standard tasks.
You describe a standard preprocessing subprocess taking nominal (categorical) attributes and introduces binominal dummy attributes before those are transformed to numerical which can be then used by learning schemes like SVM or Logistic Regression. I have uploaded this process with our new Community Extension (available from our Update- and Installation-Server in the Help Menu of RapidMiner). You can download and apply the process "Convert Nominal to Binominal to Numerical" (Website: http://www.myexperiment.org/workflows/1275) with a few clicks after having installed this extension.
Cheers,
Ingo0