A program to recognize and reward our most engaged community members
The Set Role operator let's you select an attribute column and set it to weight. Just select "weight" in the drop down menu and RapidMiner will recongize it as a weight.
Ok, I see so label = 0.66att1 + 0.33att2? Will the sum of the attributes equal 1? Did you try the Weights to Data or Data to Weights operator?
Hi Thomas,
Maybe my process description was a bit misleading. I try again in other words:
1) Determine the weight for each attribute.
This is done by comparing the sign of each example in an attribute with the sign of the label's examples. Then the overall ratio shall be computed. So that I get a statement like 75% of the examples of label and attX have the same sign.
2) Assign weight to attribute.
The calculated values (e.g. 75% for attX) shall then be assigned as weights to the corresponding attributes.
(The final step would be to select top n attributes with "select by weights" operator.)
The "data to weight" operator sounded good but actually it does nothing else that assigning a weight of "1" to each attribute and there is no way to feed in the determined weight.
Best regards
Sachs
Why not do this via the Generate Attributes operator?
Hi Brian,
Thank you for trying to help! Your post came just a second after my last one, where I tried to give a better description of what the result should be.
The generate attributes operator is indeed what I use to do the comparison on the examples (label n * att1 n >=0). But how to accumulate the results and transform to a weight of the ATTRIBUTE?
Kind regards
A simple Aggregate using the average function should do the trick after that, once you have the values for every example, which will give you one overall value per attribute. Then if you want you can transpose the resulting data to get a table of overall values per attribute (one attribute being each example in the transposed data) which can then be sorted and the top N can be selected.
Perhaps if you can post a small dataset with some examples and your process then it would be easier to try to work this through?
Here is a piece of code that calculates what I want to have as weights. The point where I am struggling now is to use this information in order to filter the original attributes' list.
<?xml version="1.0" encoding="UTF-8"?><process version="7.5.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="7.5.000" expanded="true" name="Process"> <process expanded="true"> <operator activated="true" class="generate_data" compatibility="7.5.000" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34"> <parameter key="number_examples" value="75"/> <parameter key="number_of_attributes" value="10"/> </operator> <operator activated="true" class="concurrency:loop_attributes" compatibility="7.5.000" expanded="true" height="103" name="Loop Attributes" width="90" x="179" y="34"> <parameter key="include_special_attributes" value="true"/> <process expanded="true"> <operator activated="true" class="generate_attributes" compatibility="7.5.000" expanded="true" height="82" name="Generate Attributes" width="90" x="45" y="34"> <list key="function_descriptions"> <parameter key="%{loop_attribute}_weight" value="if([label]*eval(%{loop_attribute})>=0,1,0)"/> </list> </operator> <operator activated="true" class="aggregate" compatibility="7.5.000" expanded="true" height="82" name="Aggregate" width="90" x="246" y="34"> <list key="aggregation_attributes"> <parameter key="%{loop_attribute}_weight" value="average"/> </list> </operator> <connect from_port="input 1" to_op="Generate Attributes" to_port="example set input"/> <connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/> <connect from_op="Aggregate" from_port="example set output" to_port="output 2"/> <connect from_op="Aggregate" from_port="original" to_port="output 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="source_input 2" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> <portSpacing port="sink_output 2" spacing="0"/> <portSpacing port="sink_output 3" spacing="0"/> </process> </operator> <connect from_op="Generate Data" from_port="output" to_op="Loop Attributes" to_port="input 1"/> <connect from_op="Loop Attributes" from_port="output 1" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator></process>