Extract Aggregates operator : Error in functions calculation ?
lionelderkrikor
New Altair Community Member
Hi RM Staff,
First I hope everyone is doing well.
Secondly, I think there is an error of calculation in the Extract Aggregates operator (Time-series module) for the :
- median
- first quartile
- third quartile
It seems that these 3 functions are assimiled to the "minimum" function...
Here the results for the "Temperature" attribute of the "Golf" dataset :
These curious results allowed me to test the new function "percentile" of the Aggregate operator. This operator give (from my point of view)
the good following results :
The process (use RM 9.1 (beta) to run this process) :
Regards,
Lionel
First I hope everyone is doing well.
Secondly, I think there is an error of calculation in the Extract Aggregates operator (Time-series module) for the :
- median
- first quartile
- third quartile
It seems that these 3 functions are assimiled to the "minimum" function...
Here the results for the "Temperature" attribute of the "Golf" dataset :
These curious results allowed me to test the new function "percentile" of the Aggregate operator. This operator give (from my point of view)
the good following results :
The process (use RM 9.1 (beta) to run this process) :
<?xml version="1.0" encoding="UTF-8"?><process version="9.1.000-BETA2"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.1.000-BETA2" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="9.1.000-BETA2" expanded="true" height="68" name="Retrieve Golf" width="90" x="112" y="85"> <parameter key="repository_entry" value="//Samples/data/Golf"/> </operator> <operator activated="true" class="time_series:extract_std_descriptive_features" compatibility="9.1.000-BETA2" expanded="true" height="82" name="Extract Aggregates" width="90" x="380" y="85"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="Temperature"/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="numeric"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="real"/> <parameter key="block_type" value="value_series"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_series_end"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> <parameter key="add_time_series_name" value="false"/> <parameter key="sum" value="true"/> <parameter key="mean" value="true"/> <parameter key="geometric_mean" value="true"/> <parameter key="first_quartile" value="true"/> <parameter key="median" value="true"/> <parameter key="third_quartile" value="true"/> <parameter key="min" value="true"/> <parameter key="max" value="true"/> <parameter key="std_deviation" value="true"/> <parameter key="kurtosis" value="true"/> <parameter key="skewness" value="true"/> <parameter key="ignore_invalid_values" value="false"/> </operator> <operator activated="true" class="aggregate" compatibility="9.1.000-BETA2" expanded="true" height="82" name="Aggregate" width="90" x="581" y="136"> <parameter key="use_default_aggregation" value="false"/> <parameter key="attribute_filter_type" value="all"/> <parameter key="attribute" value=""/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> <parameter key="default_aggregation_function" value="average"/> <list key="aggregation_attributes"> <parameter key="Temperature" value="median"/> <parameter key="Temperature" value="percentile (25)"/> <parameter key="Temperature" value="percentile (75)"/> <parameter key="Temperature" value="average"/> <parameter key="Temperature" value="minimum"/> </list> <parameter key="group_by_attributes" value=""/> <parameter key="count_all_combinations" value="false"/> <parameter key="only_distinct" value="false"/> <parameter key="ignore_missings" value="true"/> </operator> <connect from_op="Retrieve Golf" from_port="output" to_op="Extract Aggregates" to_port="example set"/> <connect from_op="Extract Aggregates" from_port="features" to_port="result 1"/> <connect from_op="Extract Aggregates" from_port="original" to_op="Aggregate" to_port="example set input"/> <connect from_op="Aggregate" from_port="example set output" to_port="result 2"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <portSpacing port="sink_result 3" spacing="0"/> </process> </operator> </process>
Lionel
2
Best Answer
-
Hi @lionelderkrikor,
Thanks for reporting this. I already spotted this and it will be fixed in the 9.1 release (fix is not included in the beta). In fact the first quartile, median and third quartile features calculated the 0.25/0.5/0.75 percent quartiles ;-) So for smaller data set (as the golf one) it is basically the min.
Best regards,
Fabian1
Answers
-
Thanks for catching this! I have also been testing the new time series operators in the RapidMiner 9.1 beta but I had not tried those specific aggregation functions yet and thus I had not observed the problem.1
-
Hi @lionelderkrikor,
Thanks for reporting this. I already spotted this and it will be fixed in the 9.1 release (fix is not included in the beta). In fact the first quartile, median and third quartile features calculated the 0.25/0.5/0.75 percent quartiles ;-) So for smaller data set (as the golf one) it is basically the min.
Best regards,
Fabian1