How do I smooth by bin means?
JamisonW
New Altair Community Member
For an assignment, i need to use smoothing by bin means. Where you sort a value, create bins of the same size, and replace the value with the bin mean.I'm having a tough time finding this feature. Discretization is the only section that discusses binning and I didn't see anything dealing with means in the transformations section. Does RapidMiner support this?
After searching a bit, I've only seen this technique mentioned in academic papers and presentations. Is this not a common technique for professionals? What is a more preferred smoothing approach?
Thanks,
Jamison
After searching a bit, I've only seen this technique mentioned in academic papers and presentations. Is this not a common technique for professionals? What is a more preferred smoothing approach?
Thanks,
Jamison
Tagged:
0
Answers
-
I didn't find the operator either (which does not mean there is no), but i have found a workaround, which can help you:
- Copy the attribute you want to smooth with the "Generate Attribute"-operator
- Use your favored discretization on the copied attribute
- Apply a average-aggregation with the copied attribute as grouping attribute and the original attribute as aggregation with the average-function
Now you can delete ("Select Attributes"-operator) the copied attribute and the original attribute is smoothed. 8)
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.009">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.009" expanded="true" name="Process">
<process expanded="true" height="558" width="696">
<operator activated="true" class="generate_data" compatibility="5.2.009" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
<parameter key="target_function" value="sum"/>
<parameter key="number_examples" value="10"/>
<parameter key="number_of_attributes" value="1"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.2.009" expanded="true" height="76" name="Generate Attributes" width="90" x="179" y="30">
<list key="function_descriptions">
<parameter key="att1_group" value="att1"/>
</list>
</operator>
<operator activated="true" class="discretize_by_bins" compatibility="5.2.009" expanded="true" height="94" name="Discretize" width="90" x="313" y="30">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="att1_group"/>
<parameter key="number_of_bins" value="5"/>
</operator>
<operator activated="true" class="aggregate" compatibility="5.2.009" expanded="true" height="76" name="Aggregate" width="90" x="447" y="30">
<list key="aggregation_attributes">
<parameter key="att1" value="average"/>
</list>
<parameter key="group_by_attributes" value="|att1_group"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Discretize" to_port="example set input"/>
<connect from_op="Discretize" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_port="result 1"/>
<connect from_op="Aggregate" from_port="original" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
It isn't very elegant, escpecially if you want to smooth more than one attribute, but maybe this is sufficient for your needs. I will ask around for another way to accomplish this.0 -
Thanks Marcn,
That got me on the right track! I had to do one extra-step to join the averages back into the original set.
My bins are still not coming out the same as in Excel, so I'll need to review. I think the difference is that in Excel I created a bin every four rows whereas RapidMiner is creating ranges for the bins. This leads to some bins having 3 and some having 5 items. To resolve this I'm looking into sorting by my value and adding a row count column (can RM do this?). The row count column will become my field to discretize.
Edit:
I found "a" solution.
1. Sort by Value
2. Generate Id (this will be a row number based on the sort)
3. Set Role of new Id to Regular
4. Discretize by Size on Id from #2
5. Multiply
6. Aggregate values from #1 grouped by Id from #2
7. Join original to #6
You now have a data set with your values grouped by bin mean.
Jamison0