Convert categorical variables into dummy variables

User: "aisyahwahyuna"
New Altair Community Member
Updated by Jocelyn
Hi, I want to perform a regression task to predict continuous response. I have 4 categorical variables, others are numerical. 

Categorical variables are:
age=(≤20, 21-35, 36-50, ≥51)
gender=(Female, Male)
income level=(1=insufficient, 2=sufficient)
BMI range=(1=<25, 2=>25)
*Income level & BMI are keyed in as numerical code in my dataset

Let's say I want to perform SVM, RF, Decision Tree, MLR, and KNN;

1. Should I convert all categorical variables into dummy variables? 
2. If using numerical coding is more suitable, should I change the data type to nominal (binominal/polynominal) or retain it as integer?

Find more posts tagged with

Sort by:
1 - 1 of 11
    Hi @aisyahwahyuna, unfortunately this is a case of it depends on which model you're using! Some models are able to handle categorical variables either in the way they're formulated, or doing an internal conversion - e.g. Decision Tree and GLM respectively. Any operator which can't will usually show you an error which reads something like this:

    Where you do want to use a model that can't support categorical variables, I'd personally be very careful in using numerical coding and recommend dummy encoding as a preferred method - here the nominal to numerical operator should work well. It can be appropriate in some instances, especially when it's binominal, but I use it sparingly as it can cause biasing of the output of your model. Hope this helps!