Pareto question

LilC
LilC New Altair Community Member
edited November 2024 in Community Q&A
Hi guys, thanks for all the help.
I am trying to use the Pareto chart. 
So far, I can see the individual country's sale in the chart with the percentage of total sales.

But if I want to see if the top 20% of countries (based on sales) have 80% of the total sales.
It is kinda like I need to sum up the sales of a country to show its percentage over other countries.
It is kinda like this:


In the case of the pic shows, how can I know the USA to Canada are the top 20% of the country on sales.


Thanks again.
Tagged:

Best Answer

  • BalazsBaranyRM
    BalazsBaranyRM New Altair Community Member
    Answer ✓
    Hi,

    for this you need to reproduce the row-wise cumulation of the sales numbers (like the red line does) and then filter on the cumulated percentage. Here's an example process:
    Spoiler
    <?xml version="1.0" encoding="UTF-8"?><process version="9.7.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.7.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="-1"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="utility:create_exampleset" compatibility="9.7.001" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34">
            <parameter key="generator_type" value="comma separated text"/>
            <parameter key="number_of_examples" value="100"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions"/>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration"/>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="input_csv_text" value="Country,Sales&#10;Mexico,45000&#10;Germany,225000&#10;Austria,120000&#10;USA,232000&#10;France,80000&#10;UK,70000&#10;Venezuela,69000&#10;Sweden,78000&#10;Brazil,110000&#10;Canada,77000&#10;Ireland,76000&#10;Belgium,50000&#10;Denmark,49000&#10;Switzerland,48000&#10;Finland,40000&#10;Spain,35000&#10;Italy,30000&#10;Portugal,25000&#10;Argentina,20000&#10;Norway,15000"/>
            <parameter key="column_separator" value=","/>
            <parameter key="parse_all_as_nominal" value="false"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
          </operator>
          <operator activated="true" class="sort" compatibility="9.7.001" expanded="true" height="82" name="Sort by sales decreasing" width="90" x="45" y="136">
            <parameter key="attribute_name" value="Sales"/>
            <parameter key="sorting_direction" value="decreasing"/>
          </operator>
          <operator activated="true" class="aggregate" compatibility="9.7.001" expanded="true" height="82" name="Sum of Sales" width="90" x="45" y="289">
            <parameter key="use_default_aggregation" value="false"/>
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="default_aggregation_function" value="average"/>
            <list key="aggregation_attributes">
              <parameter key="Sales" value="sum"/>
            </list>
            <parameter key="group_by_attributes" value=""/>
            <parameter key="count_all_combinations" value="false"/>
            <parameter key="only_distinct" value="false"/>
            <parameter key="ignore_missings" value="true"/>
          </operator>
          <operator activated="true" class="extract_macro" compatibility="9.7.001" expanded="true" height="68" name="Extract sumSales" width="90" x="179" y="289">
            <parameter key="macro" value="sumSales"/>
            <parameter key="macro_type" value="data_value"/>
            <parameter key="statistics" value="average"/>
            <parameter key="attribute_name" value="sum(Sales)"/>
            <parameter key="example_index" value="1"/>
            <list key="additional_macros"/>
            <description align="center" color="transparent" colored="false" width="126">Careful with the execution order - this should be executed before &amp;quot;Generate salesPct&amp;quot;</description>
          </operator>
          <operator activated="true" class="generate_id" compatibility="9.7.001" expanded="true" height="82" name="Generate ID" width="90" x="179" y="136">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="true" class="extract_macro" compatibility="9.7.001" expanded="true" height="68" name="Extract nrOfExamples" width="90" x="246" y="34">
            <parameter key="macro" value="nrOfExamples"/>
            <parameter key="macro_type" value="number_of_examples"/>
            <parameter key="statistics" value="average"/>
            <parameter key="attribute_name" value=""/>
            <list key="additional_macros"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="9.7.001" expanded="true" height="82" name="Generate salesPct" width="90" x="380" y="34">
            <list key="function_descriptions">
              <parameter key="salesPct" value="Sales / eval(%{sumSales})"/>
            </list>
            <parameter key="keep_all" value="true"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="9.7.001" expanded="true" height="103" name="Multiply" width="90" x="514" y="136"/>
          <operator activated="true" class="concurrency:loop" compatibility="9.7.001" expanded="true" height="82" name="Loop for cumulated sum" width="90" x="648" y="187">
            <parameter key="number_of_iterations" value="%{nrOfExamples}"/>
            <parameter key="iteration_macro" value="nr"/>
            <parameter key="reuse_results" value="false"/>
            <parameter key="enable_parallel_execution" value="false"/>
            <process expanded="true">
              <operator activated="true" class="filter_example_range" compatibility="9.7.001" expanded="true" height="82" name="Filter Example Range" width="90" x="112" y="34">
                <parameter key="first_example" value="1"/>
                <parameter key="last_example" value="%{nr}"/>
                <parameter key="invert_filter" value="false"/>
              </operator>
              <operator activated="true" class="aggregate" compatibility="9.7.001" expanded="true" height="82" name="Aggregate" width="90" x="246" y="34">
                <parameter key="use_default_aggregation" value="false"/>
                <parameter key="attribute_filter_type" value="all"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="attribute_value"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
                <parameter key="default_aggregation_function" value="average"/>
                <list key="aggregation_attributes">
                  <parameter key="id" value="maximum"/>
                  <parameter key="salesPct" value="sum"/>
                </list>
                <parameter key="group_by_attributes" value=""/>
                <parameter key="count_all_combinations" value="false"/>
                <parameter key="only_distinct" value="false"/>
                <parameter key="ignore_missings" value="true"/>
              </operator>
              <connect from_port="input 1" to_op="Filter Example Range" to_port="example set input"/>
              <connect from_op="Filter Example Range" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
              <connect from_op="Aggregate" from_port="example set output" to_port="output 1"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="append" compatibility="9.7.001" expanded="true" height="82" name="Append" width="90" x="782" y="187">
            <parameter key="datamanagement" value="double_array"/>
            <parameter key="data_management" value="auto"/>
            <parameter key="merge_type" value="all"/>
          </operator>
          <operator activated="true" class="concurrency:join" compatibility="9.7.001" expanded="true" height="82" name="Join" width="90" x="648" y="34">
            <parameter key="remove_double_attributes" value="true"/>
            <parameter key="join_type" value="inner"/>
            <parameter key="use_id_attribute_as_key" value="false"/>
            <list key="key_attributes">
              <parameter key="id" value="maximum(id)"/>
            </list>
            <parameter key="keep_both_join_attributes" value="false"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="9.7.001" expanded="true" height="103" name="Filter Examples" width="90" x="849" y="34">
            <parameter key="parameter_expression" value=""/>
            <parameter key="condition_class" value="custom_filters"/>
            <parameter key="invert_filter" value="false"/>
            <list key="filters_list">
              <parameter key="filters_entry_key" value="sum(salesPct).le.0\.8"/>
            </list>
            <parameter key="filters_logic_and" value="true"/>
            <parameter key="filters_check_metadata" value="true"/>
          </operator>
          <connect from_op="Create ExampleSet" from_port="output" to_op="Sort by sales decreasing" to_port="example set input"/>
          <connect from_op="Sort by sales decreasing" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Sort by sales decreasing" from_port="original" to_op="Sum of Sales" to_port="example set input"/>
          <connect from_op="Sum of Sales" from_port="example set output" to_op="Extract sumSales" to_port="example set"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Extract nrOfExamples" to_port="example set"/>
          <connect from_op="Extract nrOfExamples" from_port="example set" to_op="Generate salesPct" to_port="example set input"/>
          <connect from_op="Generate salesPct" from_port="example set output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Join" to_port="left"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Loop for cumulated sum" to_port="input 1"/>
          <connect from_op="Loop for cumulated sum" from_port="output 1" to_op="Append" to_port="example set 1"/>
          <connect from_op="Append" from_port="merged set" to_op="Join" to_port="right"/>
          <connect from_op="Join" from_port="join" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
          <connect from_op="Filter Examples" from_port="unmatched example set" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    


    We extract the sum of all sales by aggregating the attribute with the sum function. The result is extracted into a macro (sumSales). Then we calculate the current percentage: Sales / sumSales.
    Now we have to sum up the percentages up until the current row. This can be done in different ways, here I used Loop with the number of examples as the iteration count. Inside the loop, we select rows 1 to the current, aggregate the sales percentages, and we Append the loop result. Now we can join this back to the original table and filter the example having >= 0.8 in the cumulated sum attribute.

    Regards,
    Balázs

Answers

  • BalazsBaranyRM
    BalazsBaranyRM New Altair Community Member
    Answer ✓
    Hi,

    for this you need to reproduce the row-wise cumulation of the sales numbers (like the red line does) and then filter on the cumulated percentage. Here's an example process:
    Spoiler
    <?xml version="1.0" encoding="UTF-8"?><process version="9.7.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.7.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="-1"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="utility:create_exampleset" compatibility="9.7.001" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34">
            <parameter key="generator_type" value="comma separated text"/>
            <parameter key="number_of_examples" value="100"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions"/>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration"/>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="input_csv_text" value="Country,Sales&#10;Mexico,45000&#10;Germany,225000&#10;Austria,120000&#10;USA,232000&#10;France,80000&#10;UK,70000&#10;Venezuela,69000&#10;Sweden,78000&#10;Brazil,110000&#10;Canada,77000&#10;Ireland,76000&#10;Belgium,50000&#10;Denmark,49000&#10;Switzerland,48000&#10;Finland,40000&#10;Spain,35000&#10;Italy,30000&#10;Portugal,25000&#10;Argentina,20000&#10;Norway,15000"/>
            <parameter key="column_separator" value=","/>
            <parameter key="parse_all_as_nominal" value="false"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
          </operator>
          <operator activated="true" class="sort" compatibility="9.7.001" expanded="true" height="82" name="Sort by sales decreasing" width="90" x="45" y="136">
            <parameter key="attribute_name" value="Sales"/>
            <parameter key="sorting_direction" value="decreasing"/>
          </operator>
          <operator activated="true" class="aggregate" compatibility="9.7.001" expanded="true" height="82" name="Sum of Sales" width="90" x="45" y="289">
            <parameter key="use_default_aggregation" value="false"/>
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="default_aggregation_function" value="average"/>
            <list key="aggregation_attributes">
              <parameter key="Sales" value="sum"/>
            </list>
            <parameter key="group_by_attributes" value=""/>
            <parameter key="count_all_combinations" value="false"/>
            <parameter key="only_distinct" value="false"/>
            <parameter key="ignore_missings" value="true"/>
          </operator>
          <operator activated="true" class="extract_macro" compatibility="9.7.001" expanded="true" height="68" name="Extract sumSales" width="90" x="179" y="289">
            <parameter key="macro" value="sumSales"/>
            <parameter key="macro_type" value="data_value"/>
            <parameter key="statistics" value="average"/>
            <parameter key="attribute_name" value="sum(Sales)"/>
            <parameter key="example_index" value="1"/>
            <list key="additional_macros"/>
            <description align="center" color="transparent" colored="false" width="126">Careful with the execution order - this should be executed before &amp;quot;Generate salesPct&amp;quot;</description>
          </operator>
          <operator activated="true" class="generate_id" compatibility="9.7.001" expanded="true" height="82" name="Generate ID" width="90" x="179" y="136">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="true" class="extract_macro" compatibility="9.7.001" expanded="true" height="68" name="Extract nrOfExamples" width="90" x="246" y="34">
            <parameter key="macro" value="nrOfExamples"/>
            <parameter key="macro_type" value="number_of_examples"/>
            <parameter key="statistics" value="average"/>
            <parameter key="attribute_name" value=""/>
            <list key="additional_macros"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="9.7.001" expanded="true" height="82" name="Generate salesPct" width="90" x="380" y="34">
            <list key="function_descriptions">
              <parameter key="salesPct" value="Sales / eval(%{sumSales})"/>
            </list>
            <parameter key="keep_all" value="true"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="9.7.001" expanded="true" height="103" name="Multiply" width="90" x="514" y="136"/>
          <operator activated="true" class="concurrency:loop" compatibility="9.7.001" expanded="true" height="82" name="Loop for cumulated sum" width="90" x="648" y="187">
            <parameter key="number_of_iterations" value="%{nrOfExamples}"/>
            <parameter key="iteration_macro" value="nr"/>
            <parameter key="reuse_results" value="false"/>
            <parameter key="enable_parallel_execution" value="false"/>
            <process expanded="true">
              <operator activated="true" class="filter_example_range" compatibility="9.7.001" expanded="true" height="82" name="Filter Example Range" width="90" x="112" y="34">
                <parameter key="first_example" value="1"/>
                <parameter key="last_example" value="%{nr}"/>
                <parameter key="invert_filter" value="false"/>
              </operator>
              <operator activated="true" class="aggregate" compatibility="9.7.001" expanded="true" height="82" name="Aggregate" width="90" x="246" y="34">
                <parameter key="use_default_aggregation" value="false"/>
                <parameter key="attribute_filter_type" value="all"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="attribute_value"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
                <parameter key="default_aggregation_function" value="average"/>
                <list key="aggregation_attributes">
                  <parameter key="id" value="maximum"/>
                  <parameter key="salesPct" value="sum"/>
                </list>
                <parameter key="group_by_attributes" value=""/>
                <parameter key="count_all_combinations" value="false"/>
                <parameter key="only_distinct" value="false"/>
                <parameter key="ignore_missings" value="true"/>
              </operator>
              <connect from_port="input 1" to_op="Filter Example Range" to_port="example set input"/>
              <connect from_op="Filter Example Range" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
              <connect from_op="Aggregate" from_port="example set output" to_port="output 1"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="append" compatibility="9.7.001" expanded="true" height="82" name="Append" width="90" x="782" y="187">
            <parameter key="datamanagement" value="double_array"/>
            <parameter key="data_management" value="auto"/>
            <parameter key="merge_type" value="all"/>
          </operator>
          <operator activated="true" class="concurrency:join" compatibility="9.7.001" expanded="true" height="82" name="Join" width="90" x="648" y="34">
            <parameter key="remove_double_attributes" value="true"/>
            <parameter key="join_type" value="inner"/>
            <parameter key="use_id_attribute_as_key" value="false"/>
            <list key="key_attributes">
              <parameter key="id" value="maximum(id)"/>
            </list>
            <parameter key="keep_both_join_attributes" value="false"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="9.7.001" expanded="true" height="103" name="Filter Examples" width="90" x="849" y="34">
            <parameter key="parameter_expression" value=""/>
            <parameter key="condition_class" value="custom_filters"/>
            <parameter key="invert_filter" value="false"/>
            <list key="filters_list">
              <parameter key="filters_entry_key" value="sum(salesPct).le.0\.8"/>
            </list>
            <parameter key="filters_logic_and" value="true"/>
            <parameter key="filters_check_metadata" value="true"/>
          </operator>
          <connect from_op="Create ExampleSet" from_port="output" to_op="Sort by sales decreasing" to_port="example set input"/>
          <connect from_op="Sort by sales decreasing" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Sort by sales decreasing" from_port="original" to_op="Sum of Sales" to_port="example set input"/>
          <connect from_op="Sum of Sales" from_port="example set output" to_op="Extract sumSales" to_port="example set"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Extract nrOfExamples" to_port="example set"/>
          <connect from_op="Extract nrOfExamples" from_port="example set" to_op="Generate salesPct" to_port="example set input"/>
          <connect from_op="Generate salesPct" from_port="example set output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Join" to_port="left"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Loop for cumulated sum" to_port="input 1"/>
          <connect from_op="Loop for cumulated sum" from_port="output 1" to_op="Append" to_port="example set 1"/>
          <connect from_op="Append" from_port="merged set" to_op="Join" to_port="right"/>
          <connect from_op="Join" from_port="join" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
          <connect from_op="Filter Examples" from_port="unmatched example set" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    


    We extract the sum of all sales by aggregating the attribute with the sum function. The result is extracted into a macro (sumSales). Then we calculate the current percentage: Sales / sumSales.
    Now we have to sum up the percentages up until the current row. This can be done in different ways, here I used Loop with the number of examples as the iteration count. Inside the loop, we select rows 1 to the current, aggregate the sales percentages, and we Append the loop result. Now we can join this back to the original table and filter the example having >= 0.8 in the cumulated sum attribute.

    Regards,
    Balázs

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.