Converting nominal to numeric
Legacy User
New Altair Community Member
I am very new to RapidMinder, and I need to convert a lot of nominal attributes to numeric. I tried Nominal2Numerical in operator in RapidMiner 4.4 hoping it would do dichotomization - but it does not, instead it seems to be doing the conversion to equidistant real numbers which does not seem to make much sense, at least in my case.
Eventually, I figured out how to do dichotomization "manually" by using ConditionedFeatureGeneration, but it gets tedious since I have to create an separate operator for each value of each attribute, and there're many.
For nominals with many possible values, I figured out how to convert them to target frequencies using ValueIterator and macros, but again it seems like I might be missing a much simpler solution.
I will appreaciate any advice!
Thanks,
~Alexei
Eventually, I figured out how to do dichotomization "manually" by using ConditionedFeatureGeneration, but it gets tedious since I have to create an separate operator for each value of each attribute, and there're many.
For nominals with many possible values, I figured out how to convert them to target frequencies using ValueIterator and macros, but again it seems like I might be missing a much simpler solution.
I will appreaciate any advice!
Thanks,
~Alexei
Tagged:
0
Answers
-
Greetings Alexei,
I think you can avoid all that horrible re-typing by using variables, or as they are termed in RM "macros". Here's what I mean...
Good luck, and good weekend!<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="simple polynomial classification"/>
</operator>
<operator name="BinDiscretization" class="BinDiscretization">
<parameter key="number_of_bins" value="3"/>
<parameter key="range_name_type" value="short"/>
</operator>
<operator name="IteratingOperatorChain" class="IteratingOperatorChain" expanded="yes">
<parameter key="iterations" value="5"/>
<operator name="ConditionedFeatureGeneration" class="ConditionedFeatureGeneration">
<parameter key="attribute_name" value="att%{a}_mashed"/>
<list key="values">
<parameter key="1" value="att%{a}=range1"/>
<parameter key="2" value="att%{a}=range2"/>
<parameter key="3" value="att%{a}=range3"/>
</list>
</operator>
</operator>
</operator>0 -
Thanks! Your reply started me on the iterators path, and since I'm actually trying to dichotomize my nominals, I came up with what seemed a very nice construct below.
Problem is, it adds the correct attribute at each iteratation, but does not keep it around for the next iteration. No matter if "work_on_input" is true or false, at the very end of the process, I end up with the original unmodified ExampleSource.
I guess I could rename my attributes so I can apply your suggestion directly and get rid of FeatureIterator, but I am hoping that again, I may be missing something simple...
Thanks,
~Alexei
<operator name="FeatureIterator" class="FeatureIterator" expanded="yes">
<parameter key="filter" value=""/>
<parameter key="type_filter" value="nominal"/>
<parameter key="work_on_input" value="false"/>
<operator name="ValueIteratorNominalDichotomization" class="ValueIterator" expanded="yes">
<parameter key="attribute" value="%{loop_feature}"/>
<operator name="ConditionedFeatureGeneration" class="ConditionedFeatureGeneration">
<parameter key="attribute_name" value="%{loop_feature}_%{loop_value}"/>
<parameter key="value_type" value="integer"/>
<list key="values">
<parameter key="1" value="%{loop_feature}=%{loop_value}"/>
</list>
<parameter key="default_value" value="0"/>
</operator>
</operator>0 -
Hi again,
Successive "joins" can be a problem unless you use the IteratingOperatorChain, which keeps the output. There is an example of this at work at http://rapid-i.com/rapidforum/index.php/topic,773.0.html.0 -
Hi Alexei,
RapidMiner surely allows to find complicate solutions. Fortunately, there are often also simple ones. Just use the [tt]Nominal2Binominal[/tt] and a subsequent [tt]Nominal2Numerical[/tt] operator and you're done with the dichotomization.
Kind regards,
Tobias0 -
Nominal2binominal is good, but very slow when some nominal features have too many different values. How can I filter out (eliminate) features that have more than some number (e.g. 20) different nominal values?0
-
Phew, that's possible but a quite complex process (but it's possible):
Cheers,
<operator name="Root" class="Process" expanded="yes">
<operator name="OperatorChain" class="OperatorChain" expanded="no">
<operator name="NominalExampleSetGenerator" class="NominalExampleSetGenerator">
<parameter key="number_of_attributes" value="10"/>
<parameter key="number_of_values" value="20"/>
</operator>
<operator name="AbsoluteSampling" class="AbsoluteSampling">
<parameter key="sample_size" value="20"/>
</operator>
<operator name="GuessValueTypes" class="GuessValueTypes">
</operator>
</operator>
<operator name="FeatureIterator" class="FeatureIterator" expanded="no">
<operator name="SingleMacroDefinition" class="SingleMacroDefinition">
<parameter key="macro" value="value_counter"/>
<parameter key="value" value="0"/>
</operator>
<operator name="ValueIterator" class="ValueIterator" expanded="yes">
<parameter key="attribute" value="%{loop_feature}"/>
<operator name="MacroConstruction" class="MacroConstruction">
<list key="function_descriptions">
<parameter key="value_counter" value="%{value_counter} + 1"/>
</list>
</operator>
</operator>
<operator name="Macro2Log" class="Macro2Log">
<parameter key="macro_name" value="value_counter"/>
</operator>
<operator name="ProcessLog" class="ProcessLog">
<list key="log">
<parameter key="Feature" value="operator.FeatureIterator.value.feature_name"/>
<parameter key="Count" value="operator.Macro2Log.value.macro_value"/>
</list>
</operator>
</operator>
<operator name="ProcessLog2ExampleSet" class="ProcessLog2ExampleSet">
<parameter key="log_name" value="ProcessLog"/>
</operator>
<operator name="ClearProcessLog" class="ClearProcessLog">
<parameter key="log_name" value="ProcessLog"/>
<parameter key="delete_table" value="true"/>
</operator>
<operator name="GuessValueTypes (2)" class="GuessValueTypes">
</operator>
<operator name="ExampleFilter" class="ExampleFilter">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="Count > 12"/>
</operator>
<operator name="IOStorer" class="IOStorer">
<parameter key="name" value="original_data"/>
<parameter key="io_object" value="ExampleSet"/>
<parameter key="store_which" value="2"/>
</operator>
<operator name="ValueIterator (2)" class="ValueIterator" expanded="yes">
<parameter key="attribute" value="Feature"/>
<operator name="IORetriever" class="IORetriever">
<parameter key="name" value="original_data"/>
<parameter key="io_object" value="ExampleSet"/>
</operator>
<operator name="FeatureNameFilter" class="FeatureNameFilter">
<parameter key="skip_features_with_name" value="%{loop_value}"/>
</operator>
<operator name="IOStorer (2)" class="IOStorer">
<parameter key="name" value="original_data"/>
<parameter key="io_object" value="ExampleSet"/>
</operator>
</operator>
<operator name="IOConsumer" class="IOConsumer">
<parameter key="io_object" value="ExampleSet"/>
</operator>
<operator name="IORetriever (2)" class="IORetriever">
<parameter key="name" value="original_data"/>
<parameter key="io_object" value="ExampleSet"/>
</operator>
</operator>
Ingo0