[SOLVED] Median calculation
RapidEndUser
New Altair Community Member
Hello Support,
Is there an operator to calculate median value in a column?
Thanks,
RapidEndUser.
Is there an operator to calculate median value in a column?
Thanks,
RapidEndUser.
Tagged:
0
Answers
-
Hello
You could use the Aggregate operator although you have to add a constant value for all the examples so that the operator has something to group against in order to find the median for that grouping.<?xml version="1.0" encoding="UTF-8" standalone="no"?>
If you wanted to find the median for different labels then you would change the group by parameter to be the label in the example above.
<process version="5.3.007">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.007" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="5.3.007" expanded="true" height="60" name="Retrieve Iris" width="90" x="112" y="120">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.3.007" expanded="true" height="76" name="Generate Attributes" width="90" x="246" y="120">
<list key="function_descriptions">
<parameter key="constant" value="1"/>
</list>
</operator>
<operator activated="true" class="aggregate" compatibility="5.3.007" expanded="true" height="76" name="Aggregate" width="90" x="380" y="120">
<parameter key="default_aggregation_function" value="median"/>
<list key="aggregation_attributes">
<parameter key="a1" value="median"/>
<parameter key="a2" value="median"/>
<parameter key="a3" value="median"/>
<parameter key="a4" value="median"/>
</list>
<parameter key="group_by_attributes" value="|constant"/>
</operator>
<connect from_op="Retrieve Iris" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
regards
Andrew0 -
Actually it's even easier - I realized you don't need a constant attribute at all
This process is even simpler<?xml version="1.0" encoding="UTF-8" standalone="no"?>
regards
<process version="5.3.007">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.007" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="5.3.007" expanded="true" height="60" name="Retrieve Iris" width="90" x="112" y="120">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="aggregate" compatibility="5.3.007" expanded="true" height="76" name="Aggregate" width="90" x="246" y="120">
<parameter key="use_default_aggregation" value="true"/>
<parameter key="default_aggregation_function" value="median"/>
<list key="aggregation_attributes"/>
</operator>
<connect from_op="Retrieve Iris" from_port="output" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Andrew0 -
Thanks Andrew.
After experimenting the different operators yes I found that median function in the aggregate operator.
Currently the median value is only returning when you use aggregate function obvious since we are using aggregate function so it is returning that value only.
Row No. median(a1) median(a2) median(a3) median(4)
1 5.8 3.0 4.4 1.3
Actually I would like to keep all the rows in the table and add one more column which is called "MedianValue".
For e.g. If you have 4 columns in a table and would like to add 5th column that will be "MedianValue" column.
Since I want to do some calculation using "MedianValue" column with other columns and then produce 6th column, 7th column, etc.,
The output I am expecting:-
label a1 a2 a3 a4 median(a1) a1_percentage
Iris-setosa 5.1 3.5 1.4 0.2 5.800 0.879
Iris-setosa 4.9 3.0 1.4 0.2 5.800 0.844
Iris-setosa 4.7 3.2 1.3 0.2 5.800 0.810
Iris-setosa 4.6 3.1 1.5 0.2 5.800 0.793
Iris-setosa 5.0 3.6 1.4 0.2 5.800 0.862
Iris-setosa 5.4 3.9 1.7 0.4 5.800 0.931
Iris-setosa 4.6 3.4 1.4 0.3 5.800 0.793
Is it possible to do this?
Thanks,
RapidEndUser.0 -
Hello
It can be done using the Extract Macro operator on the aggregate result example set then using the macros with Generate Attributes to create a new attribute with the original example set and the appropriate macro.
With a bit of effort, it could be done using Loop operators but it's a bit more involved.
regards
Andrew0 -
Hi Andrew,
Can you provide me sample XML how to use Extract Macro with Generate Attributes operator?
Thanks,
RapidEndUser.0 -
Hello
I had a bit of spare time - so it's a lucky day.<?xml version="1.0" encoding="UTF-8" standalone="no"?>
Quite complex because it does the work reasonably automatically with Loops and macros - enjoy!
<process version="5.3.007">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.007" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="5.3.007" expanded="true" height="60" name="Retrieve Iris" width="90" x="112" y="75">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="aggregate" compatibility="5.3.007" expanded="true" height="76" name="Aggregate" width="90" x="112" y="165">
<parameter key="use_default_aggregation" value="true"/>
<parameter key="default_aggregation_function" value="median"/>
<list key="aggregation_attributes"/>
</operator>
<operator activated="true" class="rename_by_replacing" compatibility="5.3.007" expanded="true" height="76" name="Rename by Replacing" width="90" x="112" y="255">
<parameter key="replace_what" value="(.*)\(+(.*)\)+"/>
<parameter key="replace_by" value="$1_$2"/>
</operator>
<operator activated="true" class="loop_attributes" compatibility="5.3.007" expanded="true" height="76" name="Loop Attributes" width="90" x="112" y="345">
<process expanded="true">
<operator activated="true" class="extract_macro" compatibility="5.3.007" expanded="true" height="60" name="Extract Macro" width="90" x="179" y="30">
<parameter key="macro" value="%{loop_attribute}"/>
<parameter key="macro_type" value="data_value"/>
<parameter key="attribute_name" value="%{loop_attribute}"/>
<parameter key="example_index" value="1"/>
<list key="additional_macros"/>
</operator>
<connect from_port="example set" to_op="Extract Macro" to_port="example set"/>
<connect from_op="Extract Macro" from_port="example set" to_port="example set"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="sink_example set" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
<operator activated="true" class="subprocess" compatibility="5.3.007" expanded="true" height="94" name="enforce order" width="90" x="246" y="345">
<process expanded="true">
<connect from_port="in 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="source_in 3" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="loop_attributes" compatibility="5.3.007" expanded="true" height="76" name="Loop Attributes (2)" width="90" x="380" y="345">
<process expanded="true">
<operator activated="true" class="generate_macro" compatibility="5.3.007" expanded="true" height="76" name="Generate Macro" width="90" x="112" y="30">
<list key="function_descriptions">
<parameter key="medianMacro" value=""median_"+"%{loop_attribute}""/>
<parameter key="newAttributeName" value=""ratio_"+"%{loop_attribute}""/>
</list>
</operator>
<operator activated="true" class="generate_macro" compatibility="5.3.007" expanded="true" height="76" name="Generate Macro (2)" width="90" x="112" y="120">
<list key="function_descriptions">
<parameter key="medianMacroValue" value="macro("%{medianMacro}")"/>
</list>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.3.007" expanded="true" height="76" name="Generate Attributes" width="90" x="112" y="210">
<list key="function_descriptions">
<parameter key="%{newAttributeName}" value="%{loop_attribute}/%{medianMacroValue}"/>
</list>
</operator>
<connect from_port="example set" to_op="Generate Macro" to_port="through 1"/>
<connect from_op="Generate Macro" from_port="through 1" to_op="Generate Macro (2)" to_port="through 1"/>
<connect from_op="Generate Macro (2)" from_port="through 1" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_port="example set"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="sink_example set" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve Iris" from_port="output" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
<connect from_op="Aggregate" from_port="original" to_op="enforce order" to_port="in 1"/>
<connect from_op="Rename by Replacing" from_port="example set output" to_op="Loop Attributes" to_port="example set"/>
<connect from_op="Loop Attributes" from_port="example set" to_op="enforce order" to_port="in 2"/>
<connect from_op="enforce order" from_port="out 1" to_op="Loop Attributes (2)" to_port="example set"/>
<connect from_op="Loop Attributes (2)" from_port="example set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
regards
Andrew0 -
Would not be easier (maybe even more efficient) with a Join operator? (Obviously only for the first loop attributes.)
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
Cheers, gabor
<process version="5.2.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<process expanded="true" height="696" width="706">
<operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="Retrieve Iris" width="90" x="45" y="75">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="aggregate" compatibility="5.2.008" expanded="true" height="76" name="Aggregate" width="90" x="179" y="75">
<parameter key="use_default_aggregation" value="true"/>
<parameter key="default_aggregation_function" value="median"/>
<list key="aggregation_attributes"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.2.008" expanded="true" height="76" name="Generate Attributes" width="90" x="313" y="120">
<list key="function_descriptions">
<parameter key="One" value="1"/>
</list>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.2.008" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="313" y="30">
<list key="function_descriptions">
<parameter key="One" value="1"/>
</list>
</operator>
<operator activated="true" class="join" compatibility="5.2.008" expanded="true" height="76" name="Join" width="90" x="447" y="75">
<parameter key="use_id_attribute_as_key" value="false"/>
<list key="key_attributes">
<parameter key="One" value="One"/>
</list>
</operator>
<connect from_op="Retrieve Iris" from_port="output" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
<connect from_op="Aggregate" from_port="original" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Join" to_port="right"/>
<connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Join" to_port="left"/>
<connect from_op="Join" from_port="join" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="180"/>
</process>
</operator>
</process>0 -
Thanks Gabor.
Perfect in my scenario. It is working great. I had a separate process to calculate median value and now I don't need that. Just I can do it in one process.
Thanks,
RapidEndUser.0