Convert categorical variables into dummy variables
aisyahwahyuna
New Altair Community Member
Hi, I want to perform a regression task to predict continuous response. I have 4 categorical variables, others are numerical.
Categorical variables are:
age=(≤20, 21-35, 36-50, ≥51)
gender=(Female, Male)
income level=(1=insufficient, 2=sufficient)
BMI range=(1=<25, 2=>25)
*Income level & BMI are keyed in as numerical code in my dataset
Let's say I want to perform SVM, RF, Decision Tree, MLR, and KNN;
1. Should I convert all categorical variables into dummy variables?
2. If using numerical coding is more suitable, should I change the data type to nominal (binominal/polynominal) or retain it as integer?
Tagged:
0
Answers
-
Hi @aisyahwahyuna, unfortunately this is a case of it depends on which model you're using! Some models are able to handle categorical variables either in the way they're formulated, or doing an internal conversion - e.g. Decision Tree and GLM respectively. Any operator which can't will usually show you an error which reads something like this:
Where you do want to use a model that can't support categorical variables, I'd personally be very careful in using numerical coding and recommend dummy encoding as a preferred method - here the nominal to numerical operator should work well. It can be appropriate in some instances, especially when it's binominal, but I use it sparingly as it can cause biasing of the output of your model. Hope this helps!0