Extract Aggregates operator : Error in functions calculation ?

lionelderkrikor
lionelderkrikor New Altair Community Member
edited November 2024 in Community Q&A
Hi RM Staff,

First I hope everyone is doing well.
Secondly, I think there is an error of calculation in the Extract Aggregates operator (Time-series module) for the : 
 - median
 - first quartile
 - third quartile
It seems that these 3 functions are assimiled to the "minimum" function...
Here the results for the "Temperature" attribute of the "Golf" dataset : 


These curious results allowed me to test the new function "percentile" of the Aggregate operator. This operator give (from my point of view)
the good following results : 


 The process (use RM 9.1 (beta) to run this process) : 

<?xml version="1.0" encoding="UTF-8"?><process version="9.1.000-BETA2">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.1.000-BETA2" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="9.1.000-BETA2" expanded="true" height="68" name="Retrieve Golf" width="90" x="112" y="85">
        <parameter key="repository_entry" value="//Samples/data/Golf"/>
      </operator>
      <operator activated="true" class="time_series:extract_std_descriptive_features" compatibility="9.1.000-BETA2" expanded="true" height="82" name="Extract Aggregates" width="90" x="380" y="85">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Temperature"/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="numeric"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="real"/>
        <parameter key="block_type" value="value_series"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_series_end"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
        <parameter key="add_time_series_name" value="false"/>
        <parameter key="sum" value="true"/>
        <parameter key="mean" value="true"/>
        <parameter key="geometric_mean" value="true"/>
        <parameter key="first_quartile" value="true"/>
        <parameter key="median" value="true"/>
        <parameter key="third_quartile" value="true"/>
        <parameter key="min" value="true"/>
        <parameter key="max" value="true"/>
        <parameter key="std_deviation" value="true"/>
        <parameter key="kurtosis" value="true"/>
        <parameter key="skewness" value="true"/>
        <parameter key="ignore_invalid_values" value="false"/>
      </operator>
      <operator activated="true" class="aggregate" compatibility="9.1.000-BETA2" expanded="true" height="82" name="Aggregate" width="90" x="581" y="136">
        <parameter key="use_default_aggregation" value="false"/>
        <parameter key="attribute_filter_type" value="all"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="attribute_value"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="time"/>
        <parameter key="block_type" value="attribute_block"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_matrix_row_start"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
        <parameter key="default_aggregation_function" value="average"/>
        <list key="aggregation_attributes">
          <parameter key="Temperature" value="median"/>
          <parameter key="Temperature" value="percentile (25)"/>
          <parameter key="Temperature" value="percentile (75)"/>
          <parameter key="Temperature" value="average"/>
          <parameter key="Temperature" value="minimum"/>
        </list>
        <parameter key="group_by_attributes" value=""/>
        <parameter key="count_all_combinations" value="false"/>
        <parameter key="only_distinct" value="false"/>
        <parameter key="ignore_missings" value="true"/>
      </operator>
      <connect from_op="Retrieve Golf" from_port="output" to_op="Extract Aggregates" to_port="example set"/>
      <connect from_op="Extract Aggregates" from_port="features" to_port="result 1"/>
      <connect from_op="Extract Aggregates" from_port="original" to_op="Aggregate" to_port="example set input"/>
      <connect from_op="Aggregate" from_port="example set output" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>
Regards,

Lionel

Best Answer

  • tftemme
    tftemme New Altair Community Member
    Answer ✓
    Hi @lionelderkrikor,

    Thanks for reporting this. I already spotted this and it will be fixed in the 9.1 release (fix is not included in the beta). In fact the first quartile, median and third quartile features calculated the 0.25/0.5/0.75 percent quartiles ;-) So for smaller data set (as the golf one) it is basically the min.

    Best regards,
    Fabian

Answers

  • Telcontar120
    Telcontar120 New Altair Community Member
    Thanks for catching this!  I have also been testing the new time series operators in the RapidMiner 9.1 beta but I had not tried those specific aggregation functions yet and thus I had not observed the problem.
  • tftemme
    tftemme New Altair Community Member
    Answer ✓
    Hi @lionelderkrikor,

    Thanks for reporting this. I already spotted this and it will be fixed in the 9.1 release (fix is not included in the beta). In fact the first quartile, median and third quartile features calculated the 0.25/0.5/0.75 percent quartiles ;-) So for smaller data set (as the golf one) it is basically the min.

    Best regards,
    Fabian