🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

How to reconvert from numerical to nominal

User: "jctorresp"
New Altair Community Member
Updated by Jocelyn
Hi,

I am making my thesis about data mining so I had to convert some data from nominal to numerical, after that I exported this data to csv and process in python. But now, I have a new order in data and I need convert again in nominal values, I was searching how save a map or something like this with the original conversion, example:
column genre:
male->1
female->2
other->3

If I'd had that mapper I can reconvert from nominal to numerical, but I couldn't find a way to do that.

Is necessary indicate that I had to convert several columns so I nee something like a map by each column.

Thanks for your help

Find more posts tagged with

Sort by:
1 - 6 of 61
    User: "varunm1"
    New Altair Community Member
    Hi @jctorresp

    Did you look at the map operator in RM? This can be applied to both numerical and nominal values.


    User: "kayman"
    New Altair Community Member
    Accepted Answer
    Updated by kayman
    If you have a limited number of nominal values you could use the replace with dictionary option. This way you can control the numeric value yourself. To revert you can then use the same logic but the other way around (switch from and to)

    As in below example : 

    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="utility:create_exampleset" compatibility="9.2.001" expanded="true" height="68" name="Create ExampleSet" width="90" x="179" y="136">
            <parameter key="generator_type" value="comma separated text"/>
            <parameter key="number_of_examples" value="100"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions"/>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration"/>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="input_csv_text" value="from,to&#10;male,1&#10;female,2&#10;other,3"/>
            <parameter key="column_separator" value=","/>
            <parameter key="parse_all_as_nominal" value="false"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
          </operator>
          <operator activated="true" class="replace_dictionary" compatibility="9.2.001" expanded="true" height="103" name="Replace (Dictionary)" width="90" x="380" y="34">
            <parameter key="return_preprocessing_model" value="false"/>
            <parameter key="create_view" value="false"/>
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="myField"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="from_attribute" value="from"/>
            <parameter key="to_attribute" value="to"/>
            <parameter key="use_regular_expressions" value="false"/>
            <parameter key="convert_to_lowercase" value="false"/>
            <parameter key="first_match_only" value="false"/>
          </operator>
          <connect from_port="input 1" to_op="Replace (Dictionary)" to_port="example set input"/>
          <connect from_op="Create ExampleSet" from_port="output" to_op="Replace (Dictionary)" to_port="dictionary"/>
          <connect from_op="Replace (Dictionary)" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    


    User: "jctorresp"
    New Altair Community Member
    OP
    The problem is that I have 10 columns and in each column can have differents values. Some columns have around 7 possible values. And I need to do the same process with other set data, so is so hard have to set up manually a dictionary by each one. Finally I think that I will export the result of the nominal to numerical operator and I will go this process in python
    User: "Telcontar120"
    New Altair Community Member
    Not to throw a monkey wrench in here, but why did you need to convert nominal data to integer coding in the first place?  Doing it in the way you have described is usually not recommended for truly nominal data (like gender) rather than ordinal data because it implies numerical relationships that don't actually exist in the underlying categories if you are using coefficient based algorithms.  So you should probably be using dummy coding or effect coding instead of integer coding in the first place.
    User: "jctorresp"
    New Altair Community Member
    OP
    I am working with clustering. I need separate the data in different cluster but the most columns of the data are categorical data so I had to use k-modes that is a variation of the k-means algorithm, but the first step in that is convert data to numerical to improve the process.
    User: "Telcontar120"
    New Altair Community Member
    If you do the conversion to integer coding then you are not representing the data in a consistent way with nominal categories.  For example, If you have 4 nominal categories where the underlying data is not ordinal in any way (like the colors red, green, yellow, and blue) and you then recode them as {1,2,3,4} and then use that numerical value in any distance calculation, you are basically saying that the 1st and 4th values are much farther apart than the 2nd and 3rd values, when that isn't the case.
    In RapidMiner, both k-medoids (I assume that is what you are referring to, there is no k-mode) and k-means operators both handle nominal data just fine.  Just set the distance measure types parameter to Mixed Measures and also make sure you normalize your other numerical data (which you should do anyways whenever you are doing distance calculations).