Find more posts tagged with
Sort by:
1 - 3 of
31

Hi Chaitra,
I guess you can do it by hand, but I would rather run a correspondence analysis using R or Python. Most of the times this is not so important for a prediction task, as you can by-pass the problem using wrapper feature selection techniques (stepwise or evolutionary).
Regards,
Sebastian
You should be very careful in doing this type of analysis. There are operators that you can use to accomplish this task in RapidMiner easily (Nominal to Numerical and then Correlate) but whether it is meaningful depends on what type of categorical data you actually have and the way you do the conversion.
For example, if the data is actually nominal in nature, meaning it is not inherently ordered (think of things like colors or names) then a simple numerical replacement (where each nominal category is given a successive integer value) is actually very misleading. That type of numerical conversion is only appropriate when the nominal categories correspond to some kind of ordered scale (similar to a Likert scale). For other nominal data, you would want to do dummy coding conversion, which takes each nominal value and turns it into a zero/one variable (called a dummy code) and then you can run a correlation analysis on those attributes.
For example, if the data is actually nominal in nature, meaning it is not inherently ordered (think of things like colors or names) then a simple numerical replacement (where each nominal category is given a successive integer value) is actually very misleading. That type of numerical conversion is only appropriate when the nominal categories correspond to some kind of ordered scale (similar to a Likert scale). For other nominal data, you would want to do dummy coding conversion, which takes each nominal value and turns it into a zero/one variable (called a dummy code) and then you can run a correlation analysis on those attributes.
Hi,
For other nominal data, you would want to do dummy coding conversion,
which takes each nominal value and turns it into a zero/one variable
(called a dummy code) and then you can run a correlation analysis on
those attributes.
This is BTW what the correlation matrix in RapidMiner's Auto Model is doing. You can open the process and see how it is done on your data #noblackboxes
Best,
Ingo