🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Convert categorical variables into dummy variables

User: "aisyahwahyuna"
New Altair Community Member
Updated by Jocelyn
Hi, I want to perform a regression task to predict continuous response. I have 4 categorical variables, others are numerical. 

Categorical variables are:
age=(≤20, 21-35, 36-50, ≥51)
gender=(Female, Male)
income level=(1=insufficient, 2=sufficient)
BMI range=(1=<25, 2=>25)
*Income level & BMI are keyed in as numerical code in my dataset

Let's say I want to perform SVM, RF, Decision Tree, MLR, and KNN;

1. Should I convert all categorical variables into dummy variables? 
2. If using numerical coding is more suitable, should I change the data type to nominal (binominal/polynominal) or retain it as integer?

Find more posts tagged with

Sort by:
1 - 1 of 11
    Hi @aisyahwahyuna, unfortunately this is a case of it depends on which model you're using! Some models are able to handle categorical variables either in the way they're formulated, or doing an internal conversion - e.g. Decision Tree and GLM respectively. Any operator which can't will usually show you an error which reads something like this:

    Where you do want to use a model that can't support categorical variables, I'd personally be very careful in using numerical coding and recommend dummy encoding as a preferred method - here the nominal to numerical operator should work well. It can be appropriate in some instances, especially when it's binominal, but I use it sparingly as it can cause biasing of the output of your model. Hope this helps!