Reverse map a nominal to numerical transform

labbronx
labbronx New Altair Community Member
edited November 2024 in Community Q&A

I am using K-means to cluster the data. To do so, I have transformed my nominal values into numerical ones using the Nominal to Numerical operator, but using the coding type parameter set to "unique integers." How do I reverse this transformation so on output I can see what these values were in the clusters before they were transformed. For example, if "sandwich" gets mapped to 0, I would like to reverse map 0 back to sandwich.

Best Answer

  • FBT
    FBT New Altair Community Member
    Answer ✓

    It may not be the most elegant solution, but what you could do is the following:

     

    Multiply your example set prior to the type conversation. Connect the first output of the multiply operator to your current process, after which you add a join operator and connect the resulting example set to the left port. Connect the second output of multiply to the right port of the join. 

     

    You will need an id on which to make the join and you may want to make some pre-processing (renaming attributes, etc.).

Answers

  • FBT
    FBT New Altair Community Member
    Answer ✓

    It may not be the most elegant solution, but what you could do is the following:

     

    Multiply your example set prior to the type conversation. Connect the first output of the multiply operator to your current process, after which you add a join operator and connect the resulting example set to the left port. Connect the second output of multiply to the right port of the join. 

     

    You will need an id on which to make the join and you may want to make some pre-processing (renaming attributes, etc.).

  • Thomas_Ott
    Thomas_Ott New Altair Community Member
    That's how I usually handled it.
  • labbronx
    labbronx New Altair Community Member

    Thanks that works. Would have never thought of it.

  • Telcontar120
    Telcontar120 New Altair Community Member

    Be very careful with "unique integers" mapping if your nominal categories are not inherently ordinal.  For example, if you have sandwich, bread, and butter mapped as 1, 2, and 3, then k-means thinks that the distance between 1 and 3 is larger than the distance between 1 and 2 or 2 and 3.  But for non-ordered categories, this doesn't make any sense and can lead to strange and distorted results when clustering.  If your nominal categories are not ordered, you are better off with numerical dummy coding or simply using mixed Euclidean distance (which assumes a distance of 1 between all nominal values that are not the same, precisely to avoid this problem).

     

  • labbronx
    labbronx New Altair Community Member

    thanks. I originally used dummy coding, but it blows up the record, as I have lots of unordered nominal values. I will try using mixed Euclidean distance. How does one use this?

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    You could use effect code too, assuming your don't have too many nominal values per attribute.

  • labbronx
    labbronx New Altair Community Member

    Never mind, I figured out how to use mixed Euclidean distance

  • laavila
    laavila New Altair Community Member
    I have this problem too. I've tried with the proposed solution, with the multiply operator, but the final result I've got is just the exampleset with unique integers values (I don't understand very well the data with this values on it). I have even generate an id attribute prior to the multiply operator and after all the process, I used the join operator too.  I couldn't get the nominal values again. Anyone have an idea what I am doing wrong?  :# 
    Thanks! 
  • sgenzer
    sgenzer
    Altair Employee
    hi @laavila sorry this is an old thread. Can you please post your process XML so we can see what you're doing? Scott
  • jm_echeverria40
    jm_echeverria40 New Altair Community Member
    Hello all,

    ¿Is there any current accepted solution in the latest version of the program?
    ¿How can be do this in 2020?
    ¿Does the same mentioned methodology work?

    If possible please provide the diagram!