Convert categorical variables into dummy variables

aisyahwahyuna
aisyahwahyuna New Altair Community Member
edited November 2024 in Community Q&A
Hi, I want to perform a regression task to predict continuous response. I have 4 categorical variables, others are numerical. 

Categorical variables are:
age=(≤20, 21-35, 36-50, ≥51)
gender=(Female, Male)
income level=(1=insufficient, 2=sufficient)
BMI range=(1=<25, 2=>25)
*Income level & BMI are keyed in as numerical code in my dataset

Let's say I want to perform SVM, RF, Decision Tree, MLR, and KNN;

1. Should I convert all categorical variables into dummy variables? 
2. If using numerical coding is more suitable, should I change the data type to nominal (binominal/polynominal) or retain it as integer?

Answers

  • RolandJones
    RolandJones
    Altair Employee
    Hi @aisyahwahyuna, unfortunately this is a case of it depends on which model you're using! Some models are able to handle categorical variables either in the way they're formulated, or doing an internal conversion - e.g. Decision Tree and GLM respectively. Any operator which can't will usually show you an error which reads something like this:

    Where you do want to use a model that can't support categorical variables, I'd personally be very careful in using numerical coding and recommend dummy encoding as a preferred method - here the nominal to numerical operator should work well. It can be appropriate in some instances, especially when it's binominal, but I use it sparingly as it can cause biasing of the output of your model. Hope this helps!

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.