Classification with ordinal data
MaltePetersen
New Altair Community Member
I am new to data science and rapid miner. I made a prediction with automodel with a dataset which persists of nominal and ordinal data. Online I read that a classification is normally only done with nominal data. So this begs the question can my classification be accurate? And which method would be the right one for my use case.
Tagged:
0
Best Answers
-
Hello @MaltePetersen
Good question. I am not sure about rapidminer automodel capability to find ordinal data automatically (I dont think it can). My preference is to treat ordinal data as nominal data. Some papers suggest converting ordinal to numeric, but numeric data is continuous and equally spaced which might not be true in ordinal case. There are pros and cons for both.
@IngoRM might provide more information.
5 -
Yep, you have the option to "Change to category" right? That is the one that converts your number columns to category (which is also called as nominal).
Sorry, if I got confused. Just want to clarify, you are trying to convert the "number" format to "nominal" format right?5
Answers
-
Hello @MaltePetersen
Good question. I am not sure about rapidminer automodel capability to find ordinal data automatically (I dont think it can). My preference is to treat ordinal data as nominal data. Some papers suggest converting ordinal to numeric, but numeric data is continuous and equally spaced which might not be true in ordinal case. There are pros and cons for both.
@IngoRM might provide more information.
5 -
@MaltePetersen just to be clear, you are asking about ordinal data (e.g. 1st, 2nd, 3rd, etc..) rather than numerical data (1, 2, 3, etc...)?2
-
First of all: there are only few ML methods out there which can deal with ordinal values out of the box. All of them focus more on ordinal labels / target columns though, not really on ordinal attributes. In my opinion and experience, treating ordinal attributes simply as nominals is the way to go and much better than treating them as numericals - exactly for the reason Varun has mentioned.If you treat them as nominal, than some ML algos will either handle each category on its own (like decision trees) or transform typically with one-hot-encoding which essentially leads to treating each category again on its own. While you are not using the specific relationships between the values this way, all ML algos are powerful enough to assign importance to the values accordingly and I have yet to see a case where this did not work.Having ordinal labels (aka target columns) is a different story though! Here, there would be some benefits if the algorithm would make use of the relationships between the ordinal values. However, in 20 years now I only got a handful of requests for this, so most people seem to do just fine with treating this as regular classification problemHope this perspective helps,
Ingo2 -
With ordinal data, I would add my vote to those who recommend generally treating these as nominal rather than as numerical data. At the very least, you are not likely to do any damage this way, although you may lose some potentially useful relationships.
One other caution though is that you should probably look at the number of distinct categories that you have. If you have very many categories, and the relationship is fairly linear, then that might be an argument for treating the data as numerical. Otherwise, you may need to consider binning or other combinations of values to get the most stability out of the model. Having an attribute with too many nominal values (whether as a predictor, or even worse, as the label) can definitely cause complications, instability, or deterioration in performance.4 -
First of all I am really sorry that I took so long to answer I did not expect any answer or such goods answers at all!
So to further specify my request. The dataset I am analysing is about speeddating. My ordinal data mostly describes how the participants ranks the partner for example the ranking of the appearance or humour of the partner between 0 and 10. With this data we try to find out which attributes weight the most and try to predict new data.
2 -
Hello @MaltePetersen
My preference is to consider them as nominal as mentioned earlier as 10 is not a huge number of categories. Please feel free to ask anything you need and we are happy to help.1 -
Hey, one more follow up question. How can I transform my ordinal data to nominal data. I tried to do it in Turbo prep but if I click on change type it does not give me any options do change my type.0
-
Hello @MaltePetersen
How did rapidminer read your data? Is it in numerical form or nominal form?
How to check this: In turbo prep, you can see the data type under the attribute name.
0 -
Partially numbers and partially categories but all numbers should actually be categories/ordinal data to begin with. If they are number attributes I can not change the type at all and if they are categories I can only change them to numbers or dates.
0 -
Hello @MaltePetersen
You can select the attributes with "number" type and then "Transform" and "Change Type" to category. Here category datatype means nominal.0 -
@varunm1
I tried that but it turboprep is not giving me that option.0 -
Yep, you have the option to "Change to category" right? That is the one that converts your number columns to category (which is also called as nominal).
Sorry, if I got confused. Just want to clarify, you are trying to convert the "number" format to "nominal" format right?5 -
@varunm1 Yea now I see it sorry.. Yes I am trying that so categories do not have a natural order and are therefore nominal data right ?
1 -
Yep, that is correct.1
-
@varunm1 So right now I am only using nominal attributes. I should only use two models for my paper. Is there a model from automodel espcially fitting for my use case ?
?
Or would the better approach be to look which model has the lowest classification error and then decide for that model?0 -
You can look at model performances after running, but generally, for complete nominal data, I first focus on Logistic regression and Naive Bayes(it is easy for naive Bayes to deal with nominal data). Decision Tree for better understanding as well.1