How to reconvert from numerical to nominal

jctorresp
jctorresp New Altair Community Member
edited November 5 in Community Q&A
Hi,

I am making my thesis about data mining so I had to convert some data from nominal to numerical, after that I exported this data to csv and process in python. But now, I have a new order in data and I need convert again in nominal values, I was searching how save a map or something like this with the original conversion, example:
column genre:
male->1
female->2
other->3

If I'd had that mapper I can reconvert from nominal to numerical, but I couldn't find a way to do that.

Is necessary indicate that I had to convert several columns so I nee something like a map by each column.

Thanks for your help

Best Answer

  • kayman
    kayman New Altair Community Member
    edited April 2019 Answer ✓
    If you have a limited number of nominal values you could use the replace with dictionary option. This way you can control the numeric value yourself. To revert you can then use the same logic but the other way around (switch from and to)

    As in below example : 

    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="utility:create_exampleset" compatibility="9.2.001" expanded="true" height="68" name="Create ExampleSet" width="90" x="179" y="136">
            <parameter key="generator_type" value="comma separated text"/>
            <parameter key="number_of_examples" value="100"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions"/>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration"/>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="input_csv_text" value="from,to&#10;male,1&#10;female,2&#10;other,3"/>
            <parameter key="column_separator" value=","/>
            <parameter key="parse_all_as_nominal" value="false"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
          </operator>
          <operator activated="true" class="replace_dictionary" compatibility="9.2.001" expanded="true" height="103" name="Replace (Dictionary)" width="90" x="380" y="34">
            <parameter key="return_preprocessing_model" value="false"/>
            <parameter key="create_view" value="false"/>
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="myField"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="from_attribute" value="from"/>
            <parameter key="to_attribute" value="to"/>
            <parameter key="use_regular_expressions" value="false"/>
            <parameter key="convert_to_lowercase" value="false"/>
            <parameter key="first_match_only" value="false"/>
          </operator>
          <connect from_port="input 1" to_op="Replace (Dictionary)" to_port="example set input"/>
          <connect from_op="Create ExampleSet" from_port="output" to_op="Replace (Dictionary)" to_port="dictionary"/>
          <connect from_op="Replace (Dictionary)" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    


Answers

  • varunm1
    varunm1 New Altair Community Member
    Hi @jctorresp

    Did you look at the map operator in RM? This can be applied to both numerical and nominal values.


  • kayman
    kayman New Altair Community Member
    edited April 2019 Answer ✓
    If you have a limited number of nominal values you could use the replace with dictionary option. This way you can control the numeric value yourself. To revert you can then use the same logic but the other way around (switch from and to)

    As in below example : 

    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="utility:create_exampleset" compatibility="9.2.001" expanded="true" height="68" name="Create ExampleSet" width="90" x="179" y="136">
            <parameter key="generator_type" value="comma separated text"/>
            <parameter key="number_of_examples" value="100"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions"/>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration"/>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="input_csv_text" value="from,to&#10;male,1&#10;female,2&#10;other,3"/>
            <parameter key="column_separator" value=","/>
            <parameter key="parse_all_as_nominal" value="false"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
          </operator>
          <operator activated="true" class="replace_dictionary" compatibility="9.2.001" expanded="true" height="103" name="Replace (Dictionary)" width="90" x="380" y="34">
            <parameter key="return_preprocessing_model" value="false"/>
            <parameter key="create_view" value="false"/>
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="myField"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="from_attribute" value="from"/>
            <parameter key="to_attribute" value="to"/>
            <parameter key="use_regular_expressions" value="false"/>
            <parameter key="convert_to_lowercase" value="false"/>
            <parameter key="first_match_only" value="false"/>
          </operator>
          <connect from_port="input 1" to_op="Replace (Dictionary)" to_port="example set input"/>
          <connect from_op="Create ExampleSet" from_port="output" to_op="Replace (Dictionary)" to_port="dictionary"/>
          <connect from_op="Replace (Dictionary)" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    


  • jctorresp
    jctorresp New Altair Community Member
    The problem is that I have 10 columns and in each column can have differents values. Some columns have around 7 possible values. And I need to do the same process with other set data, so is so hard have to set up manually a dictionary by each one. Finally I think that I will export the result of the nominal to numerical operator and I will go this process in python
  • Telcontar120
    Telcontar120 New Altair Community Member
    Not to throw a monkey wrench in here, but why did you need to convert nominal data to integer coding in the first place?  Doing it in the way you have described is usually not recommended for truly nominal data (like gender) rather than ordinal data because it implies numerical relationships that don't actually exist in the underlying categories if you are using coefficient based algorithms.  So you should probably be using dummy coding or effect coding instead of integer coding in the first place.
  • jctorresp
    jctorresp New Altair Community Member
    I am working with clustering. I need separate the data in different cluster but the most columns of the data are categorical data so I had to use k-modes that is a variation of the k-means algorithm, but the first step in that is convert data to numerical to improve the process.
  • Telcontar120
    Telcontar120 New Altair Community Member
    If you do the conversion to integer coding then you are not representing the data in a consistent way with nominal categories.  For example, If you have 4 nominal categories where the underlying data is not ordinal in any way (like the colors red, green, yellow, and blue) and you then recode them as {1,2,3,4} and then use that numerical value in any distance calculation, you are basically saying that the 1st and 4th values are much farther apart than the 2nd and 3rd values, when that isn't the case.
    In RapidMiner, both k-medoids (I assume that is what you are referring to, there is no k-mode) and k-means operators both handle nominal data just fine.  Just set the distance measure types parameter to Mixed Measures and also make sure you normalize your other numerical data (which you should do anyways whenever you are doing distance calculations).