Median calculation problem in Aggregate function

luqie
New Altair Community Member
Hi guys,
I'm using RM 5.3 and 6 versions and trying to come out with a median for my data (aggregated by an attribute value).
I realized the median calculation used is not correct. RM for both versions do not seem to take the average of 2 middle values if the number list is even.
As an example, use the following data and calculate median for DOM (aggregate by DATE):
DOM DATE
33 537
47 537
49 537
57 537
79 537
91 537
102 537
123 537
133 537
134 537
149 537
155 537
186 537
238 537
The correct answer should be 112.5
RM gives the median as 102
Thanks!
I'm using RM 5.3 and 6 versions and trying to come out with a median for my data (aggregated by an attribute value).
I realized the median calculation used is not correct. RM for both versions do not seem to take the average of 2 middle values if the number list is even.
As an example, use the following data and calculate median for DOM (aggregate by DATE):
DOM DATE
33 537
47 537
49 537
57 537
79 537
91 537
102 537
123 537
133 537
134 537
149 537
155 537
186 537
238 537
The correct answer should be 112.5
RM gives the median as 102
Thanks!
Tagged:
0
Answers
-
Hello
Good find.
Interestingly, if you sort the examples, the answer changes as in the attached.<?xml version="1.0" encoding="UTF-8" standalone="no"?>
I suppose you could manually calculate the median you're after by using the two values - a bit ugly but it would work.
<process version="6.0.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.0.008" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_data" compatibility="6.0.008" expanded="true" height="60" name="Generate Data" width="90" x="112" y="75">
<parameter key="number_examples" value="10"/>
<parameter key="attributes_lower_bound" value="-1.0"/>
<parameter key="attributes_upper_bound" value="5.0"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="6.0.008" expanded="true" height="76" name="Generate Attributes" width="90" x="112" y="165">
<list key="function_descriptions">
<parameter key="att1" value="round(att1)"/>
<parameter key="constant" value="1"/>
</list>
</operator>
<operator activated="true" class="select_attributes" compatibility="6.0.008" expanded="true" height="76" name="Select Attributes" width="90" x="112" y="255">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="constant|att1"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="sort" compatibility="6.0.008" expanded="true" height="76" name="Sort" width="90" x="313" y="75">
<parameter key="attribute_name" value="att1"/>
<parameter key="sorting_direction" value="decreasing"/>
</operator>
<operator activated="true" class="aggregate" compatibility="6.0.008" expanded="true" height="76" name="Aggregate" width="90" x="313" y="165">
<list key="aggregation_attributes">
<parameter key="att1" value="median"/>
</list>
<parameter key="group_by_attributes" value="constant"/>
</operator>
<operator activated="true" class="sort" compatibility="6.0.008" expanded="true" height="76" name="Sort (2)" width="90" x="313" y="255">
<parameter key="attribute_name" value="att1"/>
</operator>
<operator activated="true" class="aggregate" compatibility="6.0.008" expanded="true" height="76" name="Aggregate (2)" width="90" x="313" y="345">
<list key="aggregation_attributes">
<parameter key="att1" value="median"/>
</list>
<parameter key="group_by_attributes" value="constant"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Sort" to_port="example set input"/>
<connect from_op="Sort" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_port="result 1"/>
<connect from_op="Aggregate" from_port="original" to_op="Sort (2)" to_port="example set input"/>
<connect from_op="Sort (2)" from_port="example set output" to_op="Aggregate (2)" to_port="example set input"/>
<connect from_op="Aggregate (2)" from_port="example set output" to_port="result 2"/>
<connect from_op="Aggregate (2)" from_port="original" to_port="result 3"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
</process>
</operator>
</process>
regards
Andrew0 -
Thanks for the workaround Andrew. Ugly, but works. It gets abit unhelpful though if I have loads of values to aggregate in the same columns (I only have one value for aggregation in the above example). Any suggestions for that?
Also, will other functions and charting be affected in the use of the engine's median function ?(eg k-medoids, boxplot etc)
Thanks!
0 -
I could imagine it would turn into a complicated process with multiple aggregation groups. It's slightly more gymnastics time than I can spare at the moment but at a high level, I would use Loop Values for each aggregation group, filter the aggregated result for that value, do the ugly sorting thing and then store the result somewhere.
I don't know what would happen elsewhere regarding median calculations - we have to wait for one of those nice developers to say whilst we remain vigilant.
regards
Andrew0