Ranking Operators?

mario_sark
mario_sark New Altair Community Member
edited November 2024 in Community Q&A
Dears, 

I am new to Rapidminer, and i am building an RFM analysis (based on the bank counter transactions)
what is the best approach to rank the R, F and M? is there any operator can i use in Rapidminer for that end?

i used to use the percentile on Excel, 

Hope you can help, 
Thanks  
Tagged:

Best Answers

  • YYH
    YYH
    Altair Employee
    Answer ✓
    Maybe these operators could ring a bell,
    Sort
    Discretize
    Extract macros for number of examples
    etc.

    You can apply aggregate for percentile. If you need to calculate n-th percentile, check out this process

    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.000">
      <context>
        <input/>
        <output/>
        <macros>
          <macro>
            <key>user_input_N</key>
            <value>99</value>
          </macro>
        </macros>
      </context>
      <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="subprocess" compatibility="9.2.000" expanded="true" height="82" name="Generate data" width="90" x="45" y="136">
            <process expanded="true">
              <operator activated="true" class="generate_sales_data" compatibility="9.2.000" expanded="true" height="68" name="Generate Sales Data" width="90" x="45" y="34">
                <parameter key="number_examples" value="1000"/>
                <parameter key="use_local_random_seed" value="false"/>
                <parameter key="local_random_seed" value="1992"/>
              </operator>
              <operator activated="true" class="generate_attributes" compatibility="9.2.000" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="34">
                <list key="function_descriptions">
                  <parameter key="Total_Price" value="amount * single_price"/>
                </list>
                <parameter key="keep_all" value="true"/>
                <description align="center" color="transparent" colored="false" width="126">generate total price = single price * amount</description>
              </operator>
              <operator activated="true" class="aggregate" compatibility="8.2.000" expanded="true" height="82" name="Aggregate" width="90" x="313" y="34">
                <parameter key="use_default_aggregation" value="false"/>
                <parameter key="attribute_filter_type" value="all"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="attribute_value"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
                <parameter key="default_aggregation_function" value="average"/>
                <list key="aggregation_attributes">
                  <parameter key="Total_Price" value="sum"/>
                </list>
                <parameter key="group_by_attributes" value="customer_id"/>
                <parameter key="count_all_combinations" value="false"/>
                <parameter key="only_distinct" value="false"/>
                <parameter key="ignore_missings" value="true"/>
                <description align="center" color="transparent" colored="false" width="126">aggregate to get total price by customer</description>
              </operator>
              <operator activated="true" class="rename" compatibility="9.2.000" expanded="true" height="82" name="Rename (2)" width="90" x="514" y="34">
                <parameter key="old_name" value="sum(Total_Price)"/>
                <parameter key="new_name" value="Total Expenses"/>
                <list key="rename_additional_attributes"/>
              </operator>
              <connect from_op="Generate Sales Data" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
              <connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
              <connect from_op="Aggregate" from_port="example set output" to_op="Rename (2)" to_port="example set input"/>
              <connect from_op="Rename (2)" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
            <description align="center" color="transparent" colored="false" width="126">simulate some sales data, say total expenses as customer life time value (LTV)</description>
          </operator>
          <operator activated="true" breakpoints="after" class="discretize_by_frequency" compatibility="9.2.000" expanded="true" height="103" name="Discretize" width="90" x="246" y="136">
            <parameter key="return_preprocessing_model" value="false"/>
            <parameter key="create_view" value="false"/>
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Total Expenses"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="numeric"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="real"/>
            <parameter key="block_type" value="value_series"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_series_end"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="use_sqrt_of_examples" value="false"/>
            <parameter key="number_of_bins" value="100"/>
            <parameter key="range_name_type" value="long"/>
            <parameter key="automatic_number_of_digits" value="true"/>
            <parameter key="number_of_digits" value="-1"/>
            <description align="center" color="transparent" colored="false" width="126">cut the expenses into 100 bins</description>
          </operator>
          <operator activated="true" class="subprocess" compatibility="9.2.000" expanded="true" height="103" name="post processing" width="90" x="447" y="136">
            <process expanded="true">
              <operator activated="true" class="rename" compatibility="9.2.000" expanded="true" height="82" name="Rename" width="90" x="179" y="34">
                <parameter key="old_name" value="Total Expenses"/>
                <parameter key="new_name" value="Range of Expenses"/>
                <list key="rename_additional_attributes"/>
              </operator>
              <operator activated="true" class="concurrency:join" compatibility="9.2.000" expanded="true" height="82" name="Join (2)" width="90" x="179" y="136">
                <parameter key="remove_double_attributes" value="true"/>
                <parameter key="join_type" value="inner"/>
                <parameter key="use_id_attribute_as_key" value="false"/>
                <list key="key_attributes">
                  <parameter key="customer_id" value="customer_id"/>
                </list>
                <parameter key="keep_both_join_attributes" value="false"/>
              </operator>
              <operator activated="true" class="generate_attributes" compatibility="9.2.000" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="380" y="34">
                <list key="function_descriptions">
                  <parameter key="Percentile_bucket" value="parse(cut([Range of Expenses],5,index([Range of Expenses],&quot;[&quot;)-5))"/>
                </list>
                <parameter key="keep_all" value="true"/>
                <description align="center" color="transparent" colored="false" width="126">get the range number, and the cut off by parsing the discretized results</description>
              </operator>
              <operator activated="true" class="filter_examples" compatibility="9.2.000" expanded="true" height="103" name="Filter Examples" width="90" x="581" y="34">
                <parameter key="parameter_expression" value=""/>
                <parameter key="condition_class" value="custom_filters"/>
                <parameter key="invert_filter" value="false"/>
                <list key="filters_list">
                  <parameter key="filters_entry_key" value="Percentile_bucket.eq.%{user_input_N}"/>
                </list>
                <parameter key="filters_logic_and" value="true"/>
                <parameter key="filters_check_metadata" value="false"/>
                <description align="center" color="transparent" colored="false" width="126">find the corresponding subset in nth percentile bucket</description>
              </operator>
              <operator activated="true" class="sort" compatibility="9.2.000" expanded="true" height="82" name="Sort (2)" width="90" x="849" y="34">
                <parameter key="attribute_name" value="Total Expenses"/>
                <parameter key="sorting_direction" value="increasing"/>
                <description align="center" color="transparent" colored="false" width="126">get the cutoff point between the nth and (n+1)th percentile</description>
              </operator>
              <operator activated="true" class="sort" compatibility="9.2.000" expanded="true" height="82" name="Sort" width="90" x="849" y="238">
                <parameter key="attribute_name" value="Total Expenses"/>
                <parameter key="sorting_direction" value="decreasing"/>
                <description align="center" color="transparent" colored="false" width="126"/>
              </operator>
              <connect from_port="in 1" to_op="Rename" to_port="example set input"/>
              <connect from_port="in 2" to_op="Join (2)" to_port="right"/>
              <connect from_op="Rename" from_port="example set output" to_op="Join (2)" to_port="left"/>
              <connect from_op="Join (2)" from_port="join" to_op="Generate Attributes (2)" to_port="example set input"/>
              <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="example set output" to_op="Sort (2)" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="original" to_op="Sort" to_port="example set input"/>
              <connect from_op="Sort (2)" from_port="example set output" to_port="out 1"/>
              <connect from_op="Sort" from_port="example set output" to_port="out 2"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
              <portSpacing port="source_in 3" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
              <portSpacing port="sink_out 3" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="extract_macro" compatibility="9.2.000" expanded="true" height="68" name="Extract Macro" width="90" x="648" y="136">
            <parameter key="macro" value="%{user_input_N}th_percentile"/>
            <parameter key="macro_type" value="data_value"/>
            <parameter key="statistics" value="average"/>
            <parameter key="attribute_name" value="Total Expenses"/>
            <parameter key="example_index" value="1"/>
            <list key="additional_macros"/>
            <description align="center" color="transparent" colored="false" width="126">extract the n-th percentile and store it into a macro variable</description>
          </operator>
          <connect from_op="Generate data" from_port="out 1" to_op="Discretize" to_port="example set input"/>
          <connect from_op="Discretize" from_port="example set output" to_op="post processing" to_port="in 1"/>
          <connect from_op="Discretize" from_port="original" to_op="post processing" to_port="in 2"/>
          <connect from_op="post processing" from_port="out 1" to_op="Extract Macro" to_port="example set"/>
          <connect from_op="post processing" from_port="out 2" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="126"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <description align="center" color="yellow" colored="false" height="98" resized="true" width="901" x="10" y="10">User input: n (defined in context view)&lt;br&gt;for calculating the n-th percentile in the given data&lt;br&gt;This can also be used for anomaly detection (at least for 1D Interquartile range method)</description>
        </process>
      </operator>
    </process>
    

    YY
  • IngoRM
    IngoRM New Altair Community Member
    Answer ✓
    Please see the post here which explains how to import the XML of such a process into RapidMiner:
    Best,
    Ingo

Answers

  • YYH
    YYH
    Altair Employee
    Answer ✓
    Maybe these operators could ring a bell,
    Sort
    Discretize
    Extract macros for number of examples
    etc.

    You can apply aggregate for percentile. If you need to calculate n-th percentile, check out this process

    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.000">
      <context>
        <input/>
        <output/>
        <macros>
          <macro>
            <key>user_input_N</key>
            <value>99</value>
          </macro>
        </macros>
      </context>
      <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="subprocess" compatibility="9.2.000" expanded="true" height="82" name="Generate data" width="90" x="45" y="136">
            <process expanded="true">
              <operator activated="true" class="generate_sales_data" compatibility="9.2.000" expanded="true" height="68" name="Generate Sales Data" width="90" x="45" y="34">
                <parameter key="number_examples" value="1000"/>
                <parameter key="use_local_random_seed" value="false"/>
                <parameter key="local_random_seed" value="1992"/>
              </operator>
              <operator activated="true" class="generate_attributes" compatibility="9.2.000" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="34">
                <list key="function_descriptions">
                  <parameter key="Total_Price" value="amount * single_price"/>
                </list>
                <parameter key="keep_all" value="true"/>
                <description align="center" color="transparent" colored="false" width="126">generate total price = single price * amount</description>
              </operator>
              <operator activated="true" class="aggregate" compatibility="8.2.000" expanded="true" height="82" name="Aggregate" width="90" x="313" y="34">
                <parameter key="use_default_aggregation" value="false"/>
                <parameter key="attribute_filter_type" value="all"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="attribute_value"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
                <parameter key="default_aggregation_function" value="average"/>
                <list key="aggregation_attributes">
                  <parameter key="Total_Price" value="sum"/>
                </list>
                <parameter key="group_by_attributes" value="customer_id"/>
                <parameter key="count_all_combinations" value="false"/>
                <parameter key="only_distinct" value="false"/>
                <parameter key="ignore_missings" value="true"/>
                <description align="center" color="transparent" colored="false" width="126">aggregate to get total price by customer</description>
              </operator>
              <operator activated="true" class="rename" compatibility="9.2.000" expanded="true" height="82" name="Rename (2)" width="90" x="514" y="34">
                <parameter key="old_name" value="sum(Total_Price)"/>
                <parameter key="new_name" value="Total Expenses"/>
                <list key="rename_additional_attributes"/>
              </operator>
              <connect from_op="Generate Sales Data" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
              <connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
              <connect from_op="Aggregate" from_port="example set output" to_op="Rename (2)" to_port="example set input"/>
              <connect from_op="Rename (2)" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
            <description align="center" color="transparent" colored="false" width="126">simulate some sales data, say total expenses as customer life time value (LTV)</description>
          </operator>
          <operator activated="true" breakpoints="after" class="discretize_by_frequency" compatibility="9.2.000" expanded="true" height="103" name="Discretize" width="90" x="246" y="136">
            <parameter key="return_preprocessing_model" value="false"/>
            <parameter key="create_view" value="false"/>
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Total Expenses"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="numeric"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="real"/>
            <parameter key="block_type" value="value_series"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_series_end"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="use_sqrt_of_examples" value="false"/>
            <parameter key="number_of_bins" value="100"/>
            <parameter key="range_name_type" value="long"/>
            <parameter key="automatic_number_of_digits" value="true"/>
            <parameter key="number_of_digits" value="-1"/>
            <description align="center" color="transparent" colored="false" width="126">cut the expenses into 100 bins</description>
          </operator>
          <operator activated="true" class="subprocess" compatibility="9.2.000" expanded="true" height="103" name="post processing" width="90" x="447" y="136">
            <process expanded="true">
              <operator activated="true" class="rename" compatibility="9.2.000" expanded="true" height="82" name="Rename" width="90" x="179" y="34">
                <parameter key="old_name" value="Total Expenses"/>
                <parameter key="new_name" value="Range of Expenses"/>
                <list key="rename_additional_attributes"/>
              </operator>
              <operator activated="true" class="concurrency:join" compatibility="9.2.000" expanded="true" height="82" name="Join (2)" width="90" x="179" y="136">
                <parameter key="remove_double_attributes" value="true"/>
                <parameter key="join_type" value="inner"/>
                <parameter key="use_id_attribute_as_key" value="false"/>
                <list key="key_attributes">
                  <parameter key="customer_id" value="customer_id"/>
                </list>
                <parameter key="keep_both_join_attributes" value="false"/>
              </operator>
              <operator activated="true" class="generate_attributes" compatibility="9.2.000" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="380" y="34">
                <list key="function_descriptions">
                  <parameter key="Percentile_bucket" value="parse(cut([Range of Expenses],5,index([Range of Expenses],&quot;[&quot;)-5))"/>
                </list>
                <parameter key="keep_all" value="true"/>
                <description align="center" color="transparent" colored="false" width="126">get the range number, and the cut off by parsing the discretized results</description>
              </operator>
              <operator activated="true" class="filter_examples" compatibility="9.2.000" expanded="true" height="103" name="Filter Examples" width="90" x="581" y="34">
                <parameter key="parameter_expression" value=""/>
                <parameter key="condition_class" value="custom_filters"/>
                <parameter key="invert_filter" value="false"/>
                <list key="filters_list">
                  <parameter key="filters_entry_key" value="Percentile_bucket.eq.%{user_input_N}"/>
                </list>
                <parameter key="filters_logic_and" value="true"/>
                <parameter key="filters_check_metadata" value="false"/>
                <description align="center" color="transparent" colored="false" width="126">find the corresponding subset in nth percentile bucket</description>
              </operator>
              <operator activated="true" class="sort" compatibility="9.2.000" expanded="true" height="82" name="Sort (2)" width="90" x="849" y="34">
                <parameter key="attribute_name" value="Total Expenses"/>
                <parameter key="sorting_direction" value="increasing"/>
                <description align="center" color="transparent" colored="false" width="126">get the cutoff point between the nth and (n+1)th percentile</description>
              </operator>
              <operator activated="true" class="sort" compatibility="9.2.000" expanded="true" height="82" name="Sort" width="90" x="849" y="238">
                <parameter key="attribute_name" value="Total Expenses"/>
                <parameter key="sorting_direction" value="decreasing"/>
                <description align="center" color="transparent" colored="false" width="126"/>
              </operator>
              <connect from_port="in 1" to_op="Rename" to_port="example set input"/>
              <connect from_port="in 2" to_op="Join (2)" to_port="right"/>
              <connect from_op="Rename" from_port="example set output" to_op="Join (2)" to_port="left"/>
              <connect from_op="Join (2)" from_port="join" to_op="Generate Attributes (2)" to_port="example set input"/>
              <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="example set output" to_op="Sort (2)" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="original" to_op="Sort" to_port="example set input"/>
              <connect from_op="Sort (2)" from_port="example set output" to_port="out 1"/>
              <connect from_op="Sort" from_port="example set output" to_port="out 2"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
              <portSpacing port="source_in 3" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
              <portSpacing port="sink_out 3" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="extract_macro" compatibility="9.2.000" expanded="true" height="68" name="Extract Macro" width="90" x="648" y="136">
            <parameter key="macro" value="%{user_input_N}th_percentile"/>
            <parameter key="macro_type" value="data_value"/>
            <parameter key="statistics" value="average"/>
            <parameter key="attribute_name" value="Total Expenses"/>
            <parameter key="example_index" value="1"/>
            <list key="additional_macros"/>
            <description align="center" color="transparent" colored="false" width="126">extract the n-th percentile and store it into a macro variable</description>
          </operator>
          <connect from_op="Generate data" from_port="out 1" to_op="Discretize" to_port="example set input"/>
          <connect from_op="Discretize" from_port="example set output" to_op="post processing" to_port="in 1"/>
          <connect from_op="Discretize" from_port="original" to_op="post processing" to_port="in 2"/>
          <connect from_op="post processing" from_port="out 1" to_op="Extract Macro" to_port="example set"/>
          <connect from_op="post processing" from_port="out 2" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="126"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <description align="center" color="yellow" colored="false" height="98" resized="true" width="901" x="10" y="10">User input: n (defined in context view)&lt;br&gt;for calculating the n-th percentile in the given data&lt;br&gt;This can also be used for anomaly detection (at least for 1D Interquartile range method)</description>
        </process>
      </operator>
    </process>
    

    YY
  • mario_sark
    mario_sark New Altair Community Member
    yyhuang  Thank You, You actual;ly  gave me an idea of what i should do, but i have a question how can I check this long process? where should i put the script to check how it works ? i am sorry for this question but i am new here.
    Thanks. 


  • IngoRM
    IngoRM New Altair Community Member
    Answer ✓
    Please see the post here which explains how to import the XML of such a process into RapidMiner:
    Best,
    Ingo