Aggregate operator applied to each subset

User: "SylvainM"
New Altair Community Member
Updated by Jocelyn
Hello everyone,

I apologize if this question has already been asked elsewhere, or if it is an obvious one. I'm still learning how to use Rapidminer :smile:

This is my problem. Let's suppose that I have a dataset looking like that (but much more different values):

Year Region Item
01      QC      CCD
01      QC      CCD
01      QC      CS

01      ON      CCD
01      ON      CS

01      NB      CCD
01      NB      CS
--------------------------
02      QC      CCD
02      QC      CS
02      QC      CS

02      ON      CS
02      ON      CS

02      NB      CCD
02      NB      CCD

I would like to get the relative percentage of each Item related to the Region and to the Year

Year Region Item   Proportion
01      QC      CCD   66.6%
01      QC      CS      33.3%

01      ON      CCD   50%
01      ON      CS      50%

01      NB      CCD    50%
01      NB      CCS    50%
-------------------------------------
02      QC      CCD    33.3%
02      QC      CS       66.6%

02      ON      CS       100%

02      NB      CCD     100%

I tried many combinations with the operators Aggregate, Loop values, Branch, etc. but I seem to constantly fail... 

Do you have any suggestion?

Thanks a lot!
Sylvain

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "SGolbert"
    New Altair Community Member
    Accepted Answer

    I think I managed to do what you wanted with 2 aggregations and one Join, check it out:



    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="python_scripting:execute_python" compatibility="9.2.000" expanded="true" height="82" name="Execute Python" width="90" x="112" y="34">
            <parameter key="script" value="import pandas&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def rm_main():&#10;&#9;import pandas as pd&#10;&#9;import random&#10;&#10;&#9;dt = []&#10;&#9;&#10;&#9;for i in range(100):&#10;&#9;    region = random.choice(['America', 'Europe', 'Asia'])&#10;&#9;    quartile = random.choice(range(1,5))&#10;&#9;    item = random.choice(['A', 'B', 'C'])&#10;&#9;    dt.append([region, quartile, item])&#10;&#9;    &#10;&#9;dt = pd.DataFrame(dt, columns=['region', 'quarter', 'item'])&#10;&#9;return dt"/>
            <parameter key="use_default_python" value="true"/>
            <parameter key="package_manager" value="conda (anaconda)"/>
          </operator>
          <operator activated="true" class="numerical_to_polynominal" compatibility="9.2.001" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="246" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="quarter"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="numeric"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="real"/>
            <parameter key="block_type" value="value_series"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_series_end"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="order_attributes" compatibility="9.2.001" expanded="true" height="82" name="Reorder Attributes" width="90" x="380" y="34">
            <parameter key="sort_mode" value="user specified"/>
            <parameter key="attribute_ordering" value="region|quarter|item"/>
            <parameter key="use_regular_expressions" value="false"/>
            <parameter key="handle_unmatched" value="append"/>
            <parameter key="sort_direction" value="ascending"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="9.2.001" expanded="true" height="103" name="Multiply" width="90" x="514" y="34"/>
          <operator activated="true" class="aggregate" compatibility="9.2.001" expanded="true" height="82" name="Aggregate" width="90" x="648" y="34">
            <parameter key="use_default_aggregation" value="false"/>
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="default_aggregation_function" value="average"/>
            <list key="aggregation_attributes">
              <parameter key="item" value="count"/>
            </list>
            <parameter key="group_by_attributes" value="quarter|region|item"/>
            <parameter key="count_all_combinations" value="false"/>
            <parameter key="only_distinct" value="false"/>
            <parameter key="ignore_missings" value="true"/>
          </operator>
          <operator activated="true" class="aggregate" compatibility="9.2.001" expanded="true" height="82" name="Aggregate (2)" width="90" x="648" y="187">
            <parameter key="use_default_aggregation" value="false"/>
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="default_aggregation_function" value="average"/>
            <list key="aggregation_attributes">
              <parameter key="item" value="count"/>
            </list>
            <parameter key="group_by_attributes" value="quarter|region"/>
            <parameter key="count_all_combinations" value="false"/>
            <parameter key="only_distinct" value="false"/>
            <parameter key="ignore_missings" value="true"/>
          </operator>
          <operator activated="true" class="rename" compatibility="9.2.001" expanded="true" height="82" name="Rename" width="90" x="782" y="187">
            <parameter key="old_name" value="count(item)"/>
            <parameter key="new_name" value="region_quarter_total"/>
            <list key="rename_additional_attributes"/>
          </operator>
          <operator activated="true" class="concurrency:join" compatibility="9.2.001" expanded="true" height="82" name="Join" width="90" x="916" y="34">
            <parameter key="remove_double_attributes" value="true"/>
            <parameter key="join_type" value="inner"/>
            <parameter key="use_id_attribute_as_key" value="false"/>
            <list key="key_attributes">
              <parameter key="region" value="region"/>
              <parameter key="quarter" value="quarter"/>
            </list>
            <parameter key="keep_both_join_attributes" value="false"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="9.2.001" expanded="true" height="82" name="Generate Attributes" width="90" x="1050" y="34">
            <list key="function_descriptions">
              <parameter key="proportion" value="[count(item)]/region_quarter_total"/>
            </list>
            <parameter key="keep_all" value="true"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.2.001" expanded="true" height="82" name="Select Attributes" width="90" x="1184" y="34">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="region|quarter|item|proportion"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <connect from_op="Execute Python" from_port="output 1" to_op="Numerical to Polynominal" to_port="example set input"/>
          <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Reorder Attributes" to_port="example set input"/>
          <connect from_op="Reorder Attributes" from_port="example set output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Aggregate (2)" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_op="Join" to_port="left"/>
          <connect from_op="Aggregate (2)" from_port="example set output" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_op="Join" to_port="right"/>
          <connect from_op="Join" from_port="join" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>


    I generated some random data with Python.


    Best regards,
    Sebastian