Sign parity / calculated data to weight

Dear community,
I am looking for a hint on how to realize the following algorithm in Rapidminer:
Given is an example set with a label and a few attributes:
label att1 att2
1 2 -1
3 2 -2
-1 1 -3
Next I want to calculate sign parity (-> ratio of identical sign):
Sign parity label / att1 => +/+ and +/+ and -/+ divided by 3 => 0,66
Sign parity label / att2 => +/- and +/- and -/- divided by 3 => 0,33
Finally, the results shall be assigned as weights to each attribute.
Weight att1 = 0,66
Weight att2 = 0,33
I managed to calculate sign parity so far by performing a simple statement (e.g. label * att1 >= 0) which I can loop through all examples and then divide by the number of examples. But how to transfer this back to weights?
Best regards
Sachs
Answers
-
The Set Role operator let's you select an attribute column and set it to weight. Just select "weight" in the drop down menu and RapidMiner will recongize it as a weight.
0 -
Hi Thomas,
Thank you for your input. The set role operator allows to define one attribute as weight. This will give me one weight per example.
Contrary to this I want to have a weight for each attribute (based on how often the sign of each exampel equal the label example).
Kind regards
Sachs0 -
Ok, I see so label = 0.66att1 + 0.33att2? Will the sum of the attributes equal 1? Did you try the Weights to Data or Data to Weights operator?
0 -
Hi Thomas,
Maybe my process description was a bit misleading. I try again in other words:
1) Determine the weight for each attribute.
This is done by comparing the sign of each example in an attribute with the sign of the label's examples. Then the overall ratio shall be computed. So that I get a statement like 75% of the examples of label and attX have the same sign.
2) Assign weight to attribute.
The calculated values (e.g. 75% for attX) shall then be assigned as weights to the corresponding attributes.
(The final step would be to select top n attributes with "select by weights" operator.)
The "data to weight" operator sounded good but actually it does nothing else that assigning a weight of "1" to each attribute and there is no way to feed in the determined weight.
Best regards
Sachs
0 -
Why not do this via the Generate Attributes operator?
0 -
Hi Brian,
Thank you for trying to help! Your post came just a second after my last one, where I tried to give a better description of what the result should be.
The generate attributes operator is indeed what I use to do the comparison on the examples (label n * att1 n >=0). But how to accumulate the results and transform to a weight of the ATTRIBUTE?
Kind regards
Sachs
0 -
A simple Aggregate using the average function should do the trick after that, once you have the values for every example, which will give you one overall value per attribute. Then if you want you can transpose the resulting data to get a table of overall values per attribute (one attribute being each example in the transposed data) which can then be sorted and the top N can be selected.
0 -
Hi Brian,
Thank you for your contribution. In my attempt to implement your suggestion the aggregate operator with average function on a generated attribute does the job to calculate the desired value. I also can imagine how transpose will look like. But currently I am stuck in the middle of this process.
Aggregate does now calculate the "weight" for att1 but the operator's result still needs to be moved to the last example of att1. Only if in the end the weights of all attributes are in the same example row I can start with transpose.
Best regards
Sachs0 -
Perhaps if you can post a small dataset with some examples and your process then it would be easier to try to work this through?
0 -
Hi Brian,
Here is a piece of code that calculates what I want to have as weights. The point where I am struggling now is to use this information in order to filter the original attributes' list.
Best regards
Sachs
<?xml version="1.0" encoding="UTF-8"?><process version="7.5.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.5.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_data" compatibility="7.5.000" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34">
<parameter key="number_examples" value="75"/>
<parameter key="number_of_attributes" value="10"/>
</operator>
<operator activated="true" class="concurrency:loop_attributes" compatibility="7.5.000" expanded="true" height="103" name="Loop Attributes" width="90" x="179" y="34">
<parameter key="include_special_attributes" value="true"/>
<process expanded="true">
<operator activated="true" class="generate_attributes" compatibility="7.5.000" expanded="true" height="82" name="Generate Attributes" width="90" x="45" y="34">
<list key="function_descriptions">
<parameter key="%{loop_attribute}_weight" value="if([label]*eval(%{loop_attribute})>=0,1,0)"/>
</list>
</operator>
<operator activated="true" class="aggregate" compatibility="7.5.000" expanded="true" height="82" name="Aggregate" width="90" x="246" y="34">
<list key="aggregation_attributes">
<parameter key="%{loop_attribute}_weight" value="average"/>
</list>
</operator>
<connect from_port="input 1" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_port="output 2"/>
<connect from_op="Aggregate" from_port="original" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
<portSpacing port="sink_output 3" spacing="0"/>
</process>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Loop Attributes" to_port="input 1"/>
<connect from_op="Loop Attributes" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0