🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Converting nominal to numeric

Legacy UserUser: "Legacy User"
New Altair Community Member
Updated by Jocelyn
I am very new to RapidMinder, and I need to convert a lot of nominal attributes to numeric. I tried Nominal2Numerical in operator in RapidMiner 4.4 hoping it would do dichotomization - but it does not, instead it seems to be doing the conversion to equidistant real numbers which does not seem to make much sense, at least in my case.

Eventually, I figured out how to do dichotomization "manually" by using ConditionedFeatureGeneration, but it gets tedious since I have to create an separate operator for each value of each attribute, and there're many.

For nominals with many possible values, I figured out how to convert them to target frequencies using ValueIterator and macros, but again it seems like I might be missing a much simpler solution.

I will appreaciate any advice!

Thanks,
~Alexei

Find more posts tagged with

Sort by:
1 - 6 of 61
    Greetings Alexei,

    I think you can avoid all that horrible re-typing by using variables, or as they are termed in RM "macros". Here's what I mean...

    Good luck, and good weekend!
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="simple polynomial classification"/>
        </operator>
        <operator name="BinDiscretization" class="BinDiscretization">
            <parameter key="number_of_bins" value="3"/>
            <parameter key="range_name_type" value="short"/>
        </operator>
        <operator name="IteratingOperatorChain" class="IteratingOperatorChain" expanded="yes">
            <parameter key="iterations" value="5"/>
            <operator name="ConditionedFeatureGeneration" class="ConditionedFeatureGeneration">
                <parameter key="attribute_name" value="att%{a}_mashed"/>
                <list key="values">
                  <parameter key="1" value="att%{a}=range1"/>
                  <parameter key="2" value="att%{a}=range2"/>
                  <parameter key="3" value="att%{a}=range3"/>
                </list>
            </operator>
        </operator>
    </operator>
    Thanks! Your reply started me on the iterators path, and since I'm actually trying to dichotomize my nominals, I came up with what seemed a very nice construct below.

    Problem is, it adds the correct attribute at each iteratation, but does not keep it around for the next iteration. No matter if "work_on_input" is true or false, at the very end of the process, I end up with the original unmodified ExampleSource.

    I guess I could rename my attributes so I can apply your suggestion directly and get rid of FeatureIterator, but I am hoping that again, I may be missing something simple...

    Thanks,
    ~Alexei

        <operator name="FeatureIterator" class="FeatureIterator" expanded="yes">
            <parameter key="filter" value=""/>
            <parameter key="type_filter" value="nominal"/>
            <parameter key="work_on_input" value="false"/>
            <operator name="ValueIteratorNominalDichotomization" class="ValueIterator" expanded="yes">
                <parameter key="attribute" value="%{loop_feature}"/>
                <operator name="ConditionedFeatureGeneration" class="ConditionedFeatureGeneration">
                    <parameter key="attribute_name" value="%{loop_feature}_%{loop_value}"/>
                    <parameter key="value_type" value="integer"/>
                    <list key="values">
                      <parameter key="1" value="%{loop_feature}=%{loop_value}"/>
                    </list>
                    <parameter key="default_value" value="0"/>
                </operator>
        </operator>
    Hi again,

    Successive "joins" can be a problem unless you use the IteratingOperatorChain, which keeps the output. There is an example of this at work at http://rapid-i.com/rapidforum/index.php/topic,773.0.html.
    Hi Alexei,

    RapidMiner surely allows to find complicate solutions. Fortunately, there are often also simple ones. :) Just use the [tt]Nominal2Binominal[/tt] and a subsequent [tt]Nominal2Numerical[/tt] operator and you're done with the dichotomization.

    Kind regards,
    Tobias
    cantabUser: "cantab"
    New Altair Community Member
    Nominal2binominal is good, but very slow when some nominal features have too many different values.  How can I filter out (eliminate) features that have more than some number (e.g. 20) different nominal values?
    IngoRMUser: "IngoRM"
    New Altair Community Member
    Phew, that's possible but a quite complex process (but it's possible):

    <operator name="Root" class="Process" expanded="yes">
        <operator name="OperatorChain" class="OperatorChain" expanded="no">
            <operator name="NominalExampleSetGenerator" class="NominalExampleSetGenerator">
                <parameter key="number_of_attributes" value="10"/>
                <parameter key="number_of_values" value="20"/>
            </operator>
            <operator name="AbsoluteSampling" class="AbsoluteSampling">
                <parameter key="sample_size" value="20"/>
            </operator>
            <operator name="GuessValueTypes" class="GuessValueTypes">
            </operator>
        </operator>
        <operator name="FeatureIterator" class="FeatureIterator" expanded="no">
            <operator name="SingleMacroDefinition" class="SingleMacroDefinition">
                <parameter key="macro" value="value_counter"/>
                <parameter key="value" value="0"/>
            </operator>
            <operator name="ValueIterator" class="ValueIterator" expanded="yes">
                <parameter key="attribute" value="%{loop_feature}"/>
                <operator name="MacroConstruction" class="MacroConstruction">
                    <list key="function_descriptions">
                      <parameter key="value_counter" value="%{value_counter} + 1"/>
                    </list>
                </operator>
            </operator>
            <operator name="Macro2Log" class="Macro2Log">
                <parameter key="macro_name" value="value_counter"/>
            </operator>
            <operator name="ProcessLog" class="ProcessLog">
                <list key="log">
                  <parameter key="Feature" value="operator.FeatureIterator.value.feature_name"/>
                  <parameter key="Count" value="operator.Macro2Log.value.macro_value"/>
                </list>
            </operator>
        </operator>
        <operator name="ProcessLog2ExampleSet" class="ProcessLog2ExampleSet">
            <parameter key="log_name" value="ProcessLog"/>
        </operator>
        <operator name="ClearProcessLog" class="ClearProcessLog">
            <parameter key="log_name" value="ProcessLog"/>
            <parameter key="delete_table" value="true"/>
        </operator>
        <operator name="GuessValueTypes (2)" class="GuessValueTypes">
        </operator>
        <operator name="ExampleFilter" class="ExampleFilter">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="Count &gt; 12"/>
        </operator>
        <operator name="IOStorer" class="IOStorer">
            <parameter key="name" value="original_data"/>
            <parameter key="io_object" value="ExampleSet"/>
            <parameter key="store_which" value="2"/>
        </operator>
        <operator name="ValueIterator (2)" class="ValueIterator" expanded="yes">
            <parameter key="attribute" value="Feature"/>
            <operator name="IORetriever" class="IORetriever">
                <parameter key="name" value="original_data"/>
                <parameter key="io_object" value="ExampleSet"/>
            </operator>
            <operator name="FeatureNameFilter" class="FeatureNameFilter">
                <parameter key="skip_features_with_name" value="%{loop_value}"/>
            </operator>
            <operator name="IOStorer (2)" class="IOStorer">
                <parameter key="name" value="original_data"/>
                <parameter key="io_object" value="ExampleSet"/>
            </operator>
        </operator>
        <operator name="IOConsumer" class="IOConsumer">
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
        <operator name="IORetriever (2)" class="IORetriever">
            <parameter key="name" value="original_data"/>
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
    </operator>
    Cheers,
    Ingo