Dummy Encoding in Rapidminer

Adi1215
New Altair Community Member
Hi, I am new to Rapidminer and building my first predictive model. While working on the feature engineering part I used dummy encoding on one of the categorical columns, it gave me columns based on the number of categories present in that column. Ideally, it should give n-1 column else multicollinearity will increase as per my understanding. Any trick to get rid off from this issue. Do I need to manually delete one of the generated columns after applying dummy encoding?
Guys, please share your thoughts.
Regards,
Tagged:
0
Best Answer
-
Most of the modern ML algorithms implemented in RapidMiner include adjustments for perfect multi-collinearity if needed, so dummy coding is actually just fine. But the Nominal To Numerical operator supports the n-1 encoding approach as well, just select the "effect coding" option in the coding type parameter instead of dummy coding and then specify the omitted categories in the resulting "comparison groups" dialog box. This is tedious for a large number of attributes, though, so if you can use dummy coding, that is preferable.
5
Answers
-
Most of the modern ML algorithms implemented in RapidMiner include adjustments for perfect multi-collinearity if needed, so dummy coding is actually just fine. But the Nominal To Numerical operator supports the n-1 encoding approach as well, just select the "effect coding" option in the coding type parameter instead of dummy coding and then specify the omitted categories in the resulting "comparison groups" dialog box. This is tedious for a large number of attributes, though, so if you can use dummy coding, that is preferable.
5 -
Thanks for this. I'll try this out and let you know.0