Converting nominal to numeric

Legacy User
Legacy User New Altair Community Member
edited November 5 in Community Q&A
I am very new to RapidMinder, and I need to convert a lot of nominal attributes to numeric. I tried Nominal2Numerical in operator in RapidMiner 4.4 hoping it would do dichotomization - but it does not, instead it seems to be doing the conversion to equidistant real numbers which does not seem to make much sense, at least in my case.

Eventually, I figured out how to do dichotomization "manually" by using ConditionedFeatureGeneration, but it gets tedious since I have to create an separate operator for each value of each attribute, and there're many.

For nominals with many possible values, I figured out how to convert them to target frequencies using ValueIterator and macros, but again it seems like I might be missing a much simpler solution.

I will appreaciate any advice!

Thanks,
~Alexei

Answers

  • haddock
    haddock New Altair Community Member
    Greetings Alexei,

    I think you can avoid all that horrible re-typing by using variables, or as they are termed in RM "macros". Here's what I mean...

    Good luck, and good weekend!
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="simple polynomial classification"/>
        </operator>
        <operator name="BinDiscretization" class="BinDiscretization">
            <parameter key="number_of_bins" value="3"/>
            <parameter key="range_name_type" value="short"/>
        </operator>
        <operator name="IteratingOperatorChain" class="IteratingOperatorChain" expanded="yes">
            <parameter key="iterations" value="5"/>
            <operator name="ConditionedFeatureGeneration" class="ConditionedFeatureGeneration">
                <parameter key="attribute_name" value="att%{a}_mashed"/>
                <list key="values">
                  <parameter key="1" value="att%{a}=range1"/>
                  <parameter key="2" value="att%{a}=range2"/>
                  <parameter key="3" value="att%{a}=range3"/>
                </list>
            </operator>
        </operator>
    </operator>
  • abetin71
    abetin71 New Altair Community Member
    Thanks! Your reply started me on the iterators path, and since I'm actually trying to dichotomize my nominals, I came up with what seemed a very nice construct below.

    Problem is, it adds the correct attribute at each iteratation, but does not keep it around for the next iteration. No matter if "work_on_input" is true or false, at the very end of the process, I end up with the original unmodified ExampleSource.

    I guess I could rename my attributes so I can apply your suggestion directly and get rid of FeatureIterator, but I am hoping that again, I may be missing something simple...

    Thanks,
    ~Alexei

        <operator name="FeatureIterator" class="FeatureIterator" expanded="yes">
            <parameter key="filter" value=""/>
            <parameter key="type_filter" value="nominal"/>
            <parameter key="work_on_input" value="false"/>
            <operator name="ValueIteratorNominalDichotomization" class="ValueIterator" expanded="yes">
                <parameter key="attribute" value="%{loop_feature}"/>
                <operator name="ConditionedFeatureGeneration" class="ConditionedFeatureGeneration">
                    <parameter key="attribute_name" value="%{loop_feature}_%{loop_value}"/>
                    <parameter key="value_type" value="integer"/>
                    <list key="values">
                      <parameter key="1" value="%{loop_feature}=%{loop_value}"/>
                    </list>
                    <parameter key="default_value" value="0"/>
                </operator>
        </operator>
  • haddock
    haddock New Altair Community Member
    Hi again,

    Successive "joins" can be a problem unless you use the IteratingOperatorChain, which keeps the output. There is an example of this at work at http://rapid-i.com/rapidforum/index.php/topic,773.0.html.
  • TobiasMalbrecht
    TobiasMalbrecht New Altair Community Member
    Hi Alexei,

    RapidMiner surely allows to find complicate solutions. Fortunately, there are often also simple ones. :) Just use the [tt]Nominal2Binominal[/tt] and a subsequent [tt]Nominal2Numerical[/tt] operator and you're done with the dichotomization.

    Kind regards,
    Tobias
  • cantab
    cantab New Altair Community Member
    Nominal2binominal is good, but very slow when some nominal features have too many different values.  How can I filter out (eliminate) features that have more than some number (e.g. 20) different nominal values?
  • IngoRM
    IngoRM New Altair Community Member
    Phew, that's possible but a quite complex process (but it's possible):

    <operator name="Root" class="Process" expanded="yes">
        <operator name="OperatorChain" class="OperatorChain" expanded="no">
            <operator name="NominalExampleSetGenerator" class="NominalExampleSetGenerator">
                <parameter key="number_of_attributes" value="10"/>
                <parameter key="number_of_values" value="20"/>
            </operator>
            <operator name="AbsoluteSampling" class="AbsoluteSampling">
                <parameter key="sample_size" value="20"/>
            </operator>
            <operator name="GuessValueTypes" class="GuessValueTypes">
            </operator>
        </operator>
        <operator name="FeatureIterator" class="FeatureIterator" expanded="no">
            <operator name="SingleMacroDefinition" class="SingleMacroDefinition">
                <parameter key="macro" value="value_counter"/>
                <parameter key="value" value="0"/>
            </operator>
            <operator name="ValueIterator" class="ValueIterator" expanded="yes">
                <parameter key="attribute" value="%{loop_feature}"/>
                <operator name="MacroConstruction" class="MacroConstruction">
                    <list key="function_descriptions">
                      <parameter key="value_counter" value="%{value_counter} + 1"/>
                    </list>
                </operator>
            </operator>
            <operator name="Macro2Log" class="Macro2Log">
                <parameter key="macro_name" value="value_counter"/>
            </operator>
            <operator name="ProcessLog" class="ProcessLog">
                <list key="log">
                  <parameter key="Feature" value="operator.FeatureIterator.value.feature_name"/>
                  <parameter key="Count" value="operator.Macro2Log.value.macro_value"/>
                </list>
            </operator>
        </operator>
        <operator name="ProcessLog2ExampleSet" class="ProcessLog2ExampleSet">
            <parameter key="log_name" value="ProcessLog"/>
        </operator>
        <operator name="ClearProcessLog" class="ClearProcessLog">
            <parameter key="log_name" value="ProcessLog"/>
            <parameter key="delete_table" value="true"/>
        </operator>
        <operator name="GuessValueTypes (2)" class="GuessValueTypes">
        </operator>
        <operator name="ExampleFilter" class="ExampleFilter">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="Count &gt; 12"/>
        </operator>
        <operator name="IOStorer" class="IOStorer">
            <parameter key="name" value="original_data"/>
            <parameter key="io_object" value="ExampleSet"/>
            <parameter key="store_which" value="2"/>
        </operator>
        <operator name="ValueIterator (2)" class="ValueIterator" expanded="yes">
            <parameter key="attribute" value="Feature"/>
            <operator name="IORetriever" class="IORetriever">
                <parameter key="name" value="original_data"/>
                <parameter key="io_object" value="ExampleSet"/>
            </operator>
            <operator name="FeatureNameFilter" class="FeatureNameFilter">
                <parameter key="skip_features_with_name" value="%{loop_value}"/>
            </operator>
            <operator name="IOStorer (2)" class="IOStorer">
                <parameter key="name" value="original_data"/>
                <parameter key="io_object" value="ExampleSet"/>
            </operator>
        </operator>
        <operator name="IOConsumer" class="IOConsumer">
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
        <operator name="IORetriever (2)" class="IORetriever">
            <parameter key="name" value="original_data"/>
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
    </operator>
    Cheers,
    Ingo