🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Aggregate operator applied to each subset

User: "SylvainM"
New Altair Community Member
Updated by Jocelyn
Hello everyone,

I apologize if this question has already been asked elsewhere, or if it is an obvious one. I'm still learning how to use Rapidminer :smile:

This is my problem. Let's suppose that I have a dataset looking like that (but much more different values):

Year Region Item
01      QC      CCD
01      QC      CCD
01      QC      CS

01      ON      CCD
01      ON      CS

01      NB      CCD
01      NB      CS
--------------------------
02      QC      CCD
02      QC      CS
02      QC      CS

02      ON      CS
02      ON      CS

02      NB      CCD
02      NB      CCD

I would like to get the relative percentage of each Item related to the Region and to the Year

Year Region Item   Proportion
01      QC      CCD   66.6%
01      QC      CS      33.3%

01      ON      CCD   50%
01      ON      CS      50%

01      NB      CCD    50%
01      NB      CCS    50%
-------------------------------------
02      QC      CCD    33.3%
02      QC      CS       66.6%

02      ON      CS       100%

02      NB      CCD     100%

I tried many combinations with the operators Aggregate, Loop values, Branch, etc. but I seem to constantly fail... 

Do you have any suggestion?

Thanks a lot!
Sylvain

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "SGolbert"
    New Altair Community Member
    Accepted Answer

    I think I managed to do what you wanted with 2 aggregations and one Join, check it out:



    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="python_scripting:execute_python" compatibility="9.2.000" expanded="true" height="82" name="Execute Python" width="90" x="112" y="34">
            <parameter key="script" value="import pandas&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def rm_main():&#10;&#9;import pandas as pd&#10;&#9;import random&#10;&#10;&#9;dt = []&#10;&#9;&#10;&#9;for i in range(100):&#10;&#9;    region = random.choice(['America', 'Europe', 'Asia'])&#10;&#9;    quartile = random.choice(range(1,5))&#10;&#9;    item = random.choice(['A', 'B', 'C'])&#10;&#9;    dt.append([region, quartile, item])&#10;&#9;    &#10;&#9;dt = pd.DataFrame(dt, columns=['region', 'quarter', 'item'])&#10;&#9;return dt"/>
            <parameter key="use_default_python" value="true"/>
            <parameter key="package_manager" value="conda (anaconda)"/>
          </operator>
          <operator activated="true" class="numerical_to_polynominal" compatibility="9.2.001" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="246" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="quarter"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="numeric"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="real"/>
            <parameter key="block_type" value="value_series"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_series_end"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="order_attributes" compatibility="9.2.001" expanded="true" height="82" name="Reorder Attributes" width="90" x="380" y="34">
            <parameter key="sort_mode" value="user specified"/>
            <parameter key="attribute_ordering" value="region|quarter|item"/>
            <parameter key="use_regular_expressions" value="false"/>
            <parameter key="handle_unmatched" value="append"/>
            <parameter key="sort_direction" value="ascending"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="9.2.001" expanded="true" height="103" name="Multiply" width="90" x="514" y="34"/>
          <operator activated="true" class="aggregate" compatibility="9.2.001" expanded="true" height="82" name="Aggregate" width="90" x="648" y="34">
            <parameter key="use_default_aggregation" value="false"/>
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="default_aggregation_function" value="average"/>
            <list key="aggregation_attributes">
              <parameter key="item" value="count"/>
            </list>
            <parameter key="group_by_attributes" value="quarter|region|item"/>
            <parameter key="count_all_combinations" value="false"/>
            <parameter key="only_distinct" value="false"/>
            <parameter key="ignore_missings" value="true"/>
          </operator>
          <operator activated="true" class="aggregate" compatibility="9.2.001" expanded="true" height="82" name="Aggregate (2)" width="90" x="648" y="187">
            <parameter key="use_default_aggregation" value="false"/>
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="default_aggregation_function" value="average"/>
            <list key="aggregation_attributes">
              <parameter key="item" value="count"/>
            </list>
            <parameter key="group_by_attributes" value="quarter|region"/>
            <parameter key="count_all_combinations" value="false"/>
            <parameter key="only_distinct" value="false"/>
            <parameter key="ignore_missings" value="true"/>
          </operator>
          <operator activated="true" class="rename" compatibility="9.2.001" expanded="true" height="82" name="Rename" width="90" x="782" y="187">
            <parameter key="old_name" value="count(item)"/>
            <parameter key="new_name" value="region_quarter_total"/>
            <list key="rename_additional_attributes"/>
          </operator>
          <operator activated="true" class="concurrency:join" compatibility="9.2.001" expanded="true" height="82" name="Join" width="90" x="916" y="34">
            <parameter key="remove_double_attributes" value="true"/>
            <parameter key="join_type" value="inner"/>
            <parameter key="use_id_attribute_as_key" value="false"/>
            <list key="key_attributes">
              <parameter key="region" value="region"/>
              <parameter key="quarter" value="quarter"/>
            </list>
            <parameter key="keep_both_join_attributes" value="false"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="9.2.001" expanded="true" height="82" name="Generate Attributes" width="90" x="1050" y="34">
            <list key="function_descriptions">
              <parameter key="proportion" value="[count(item)]/region_quarter_total"/>
            </list>
            <parameter key="keep_all" value="true"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.2.001" expanded="true" height="82" name="Select Attributes" width="90" x="1184" y="34">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="region|quarter|item|proportion"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <connect from_op="Execute Python" from_port="output 1" to_op="Numerical to Polynominal" to_port="example set input"/>
          <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Reorder Attributes" to_port="example set input"/>
          <connect from_op="Reorder Attributes" from_port="example set output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Aggregate (2)" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_op="Join" to_port="left"/>
          <connect from_op="Aggregate (2)" from_port="example set output" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_op="Join" to_port="right"/>
          <connect from_op="Join" from_port="join" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>


    I generated some random data with Python.


    Best regards,
    Sebastian