[SOLVED] Nominal To Number Operator
maccten
New Altair Community Member
Hi All
Im trying to massage data in order to run a clustering algorithm on it
The dataset I have has many nominal attributes and I wish to convert them to numbers in order that the clustering algorithm works correctly
I have used the nominal to number operator but am having problems with the dummy values replacing the nominal values with numbers
What I would like is something like below where each number actually represents a value
I am unable to get this working at present. Can anyone help me out…It’s a bit of a show stopper at present
Old Value Converted Value
CH 1
IE 2
CH 1
DE 3
IL 4
Im trying to massage data in order to run a clustering algorithm on it
The dataset I have has many nominal attributes and I wish to convert them to numbers in order that the clustering algorithm works correctly
I have used the nominal to number operator but am having problems with the dummy values replacing the nominal values with numbers
What I would like is something like below where each number actually represents a value
I am unable to get this working at present. Can anyone help me out…It’s a bit of a show stopper at present
Old Value Converted Value
CH 1
IE 2
CH 1
DE 3
IL 4
Tagged:
0
Answers
-
You should use the Nominal to Numerical operator with coding_type set to dummy_coding.
Best regards,
Marius0 -
Hi Marius
Thank you for the quick response
This was originally what i was doing. I had a read Database operator which linked to a select attributes operator which selected a column that had only 3 available nominal values.
I then connected up the Nominal To Numeric operator.
What i was expecting was
value Converted Value
Value_1 1
Value_2 2
Value_2 2
What i got instead was
Value_1 Value_2
row 1 1 0
row 2 0 1
row 3 0 1
This looks like more of what i would expect from a nominal to binomial operator
Again thanks for your time
It is a very frustrating problem
0 -
Hello
I reckon the coding type should be "unique integers" as in the following<?xml version="1.0" encoding="UTF-8" standalone="no"?>
regards
<process version="5.3.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_nominal_data" compatibility="5.3.008" expanded="true" height="60" name="Generate Nominal Data" width="90" x="112" y="75"/>
<operator activated="true" class="nominal_to_numerical" compatibility="5.3.008" expanded="true" height="94" name="Nominal to Numerical" width="90" x="112" y="165">
<parameter key="coding_type" value="unique integers"/>
<list key="comparison_groups"/>
</operator>
<connect from_op="Generate Nominal Data" from_port="output" to_op="Nominal to Numerical" to_port="example set input"/>
<connect from_op="Nominal to Numerical" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Andrew0 -
To get the result described above, the coding type must in fact be unique integers. BUT: for clustering that's usually not a good choice, since unique integers would imply an ordering of the values. But imagine you have the nominal values red, green and blue. If you assign red=1, green=2 and blue=3 it would imply that blue is three times as much as red, and that a "blue" instance is further away from a 'red' instance than from a 'green' instance. That's usually not desired.
The dummy coding overcomes this and is the method of choice if you want to apply clustering, linear regression or any other algorithm that depends on only numerical values.
Best regards,
Marius0 -
Marius is absolutely correct of course.
regards
Andrew0 -
Hi Marius,
What you say makes a lot of sense and it appears i was heading down this road which would have for sure given me a poor output from the model
Thank you very much for your time
0