Hello,
My example set is similar to the one generated by this process:
<operator name="Root" class="Process" expanded="yes">
<operator name="OperatorChain" class="OperatorChain" expanded="no">
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="random"/>
</operator>
<operator name="label is regular" class="ChangeAttributeRole">
<parameter key="name" value="label"/>
</operator>
<operator name="BinDiscretization - 50" class="BinDiscretization">
<parameter key="number_of_bins" value="50"/>
<parameter key="range_name_type" value="short"/>
</operator>
<operator name="label is label" class="ChangeAttributeRole">
<parameter key="name" value="label"/>
<parameter key="target_role" value="label"/>
</operator>
<operator name="Nominal2Numerical" class="Nominal2Numerical">
</operator>
<operator name="BinDiscretization - 2" class="BinDiscretization">
<parameter key="range_name_type" value="short"/>
</operator>
<operator name="Nominal2Numerical (2)" class="Nominal2Numerical">
</operator>
<operator name="Sorting" class="Sorting">
<parameter key="attribute_name" value="label"/>
</operator>
</operator>
</operator>
Basically, I have something like this:
label att1 att2 att3 att4 att5
range1 1.0 0.0 0.0 0.0 1.0
range1 0.0 0.0 1.0 1.0 0.0
range10 1.0 1.0 1.0 0.0 0.0
range11 1.0 0.0 0.0 1.0 0.0
range11 1.0 0.0 0.0 1.0 1.0
range11 1.0 0.0 0.0 1.0 1.0
....
I would like to merge all the "rangeX" examples, so that for each attribute, the maximum across all examples with the same ID is kept. eg, I want:
label att1 att2 att3 att4 att5
range1 1.0 0.0 1.0 1.0 1.0
range10 1.0 1.0 1.0 0.0 0.0
range11 1.0 0.0 0.0 1.0 1.0
....
I hope I'm clear here... Unfortunately, I don't have access to the data format, so I must do this crazy trick. I guess I could always write my own operator to do this, but I'm sure RapidMiner has all the necessary operators already available for this!
Thanks for any pointers

- R