Can we encode categorical data to numerical and then find the correlation in Rapidminer
Chaitra
New Altair Community Member
Can we encode categorical data to numerical and then find the correlation in Rapidminer? if so please let me know the process
Tagged:
0
Answers
-
Hi Chaitra,I guess you can do it by hand, but I would rather run a correspondence analysis using R or Python. Most of the times this is not so important for a prediction task, as you can by-pass the problem using wrapper feature selection techniques (stepwise or evolutionary).Regards,Sebastian0
-
You should be very careful in doing this type of analysis. There are operators that you can use to accomplish this task in RapidMiner easily (Nominal to Numerical and then Correlate) but whether it is meaningful depends on what type of categorical data you actually have and the way you do the conversion.
For example, if the data is actually nominal in nature, meaning it is not inherently ordered (think of things like colors or names) then a simple numerical replacement (where each nominal category is given a successive integer value) is actually very misleading. That type of numerical conversion is only appropriate when the nominal categories correspond to some kind of ordered scale (similar to a Likert scale). For other nominal data, you would want to do dummy coding conversion, which takes each nominal value and turns it into a zero/one variable (called a dummy code) and then you can run a correlation analysis on those attributes.2 -
Hi,For other nominal data, you would want to do dummy coding conversion, which takes each nominal value and turns it into a zero/one variable (called a dummy code) and then you can run a correlation analysis on those attributes.
This is BTW what the correlation matrix in RapidMiner's Auto Model is doing. You can open the process and see how it is done on your data #noblackboxes
Best,
Ingo3