Selecting samples for attributes whose values contributes the most

User: "Vanlal"
New Altair Community Member
Updated by Jocelyn
I have a attribute job which is a label and has 15 different values.
Out of 1000 samples, 7 values contributes to 950 samples and remaining 8 values contributes to 50 samples.
I want to use only the 950 samples (i.e 7 values only) and ignore the rest.
How do I select the values of the label which contributes the most to the sample?
This chosen-not chosen combination may change ( 8-7,10-5,12-3 etc) depending on the data.

I tried the following approach
1) Count number of occurrence of the values in the whole table (stuck at this point)
2) Rank the values (have no idea)
3) Filter out the chosen-not chosen values (have no idea)

If a better approach/way can be suggested , I will be very grateful

I have the following table
Name Job
John Painting
Kelly Washing
Diamond Carpentry
Clarice Carpentry
Kennedy Washing
Kevin Painting
Hart Painting
Budsey Painting
David Washing

I tried to count the number of occurrence of the values in the whole table which should look like this
Name Job Total Job
John Painting 4
Kelly Washing 3
Diamond Carpentry 2
Clarice Carpentry 2
Kennedy Washing 3
Kevin Painting 4
Hart Painting 4
Budsey Painting 4
David Washing 3

I tried Generate Aggregation but it is updating it wrong
<div><?xml version="1.0" encoding="UTF-8"?><process version="9.6.000"></div><div>&nbsp; <context></div><div>&nbsp; &nbsp; <input/></div><div>&nbsp; &nbsp; <output/></div><div>&nbsp; &nbsp; <macros/></div><div>&nbsp; </context></div><div>&nbsp; <operator activated="true" class="process" compatibility="9.6.000" expanded="true" name="Process"></div><div>&nbsp; &nbsp; <parameter key="logverbosity" value="init"/></div><div>&nbsp; &nbsp; <parameter key="random_seed" value="2001"/></div><div>&nbsp; &nbsp; <parameter key="send_mail" value="never"/></div><div>&nbsp; &nbsp; <parameter key="notification_email" value=""/></div><div>&nbsp; &nbsp; <parameter key="process_duration_for_mail" value="30"/></div><div>&nbsp; &nbsp; <parameter key="encoding" value="SYSTEM"/></div><div>&nbsp; &nbsp; <process expanded="true"></div><div>&nbsp; &nbsp; &nbsp; <operator activated="true" class="retrieve" compatibility="9.6.000" expanded="true" height="68" name="Retrieve job" width="90" x="45" y="34"></div><div>&nbsp; &nbsp; &nbsp; &nbsp; <parameter key="repository_entry" value="../data/job"/></div><div>&nbsp; &nbsp; &nbsp; </operator></div><div>&nbsp; &nbsp; &nbsp; <operator activated="true" class="generate_aggregation" compatibility="9.6.000" expanded="true" height="82" name="Generate Aggregation" width="90" x="246" y="34"></div><div>&nbsp; &nbsp; &nbsp; &nbsp; <parameter key="attribute_name" value="TotalJob"/></div><div>&nbsp; &nbsp; &nbsp; &nbsp; <parameter key="attribute_filter_type" value="single"/></div><div>&nbsp; &nbsp; &nbsp; &nbsp; <parameter key="attribute" value="Job"/></div><div>&nbsp; &nbsp; &nbsp; &nbsp; <parameter key="attributes" value="Job"/></div><div>&nbsp; &nbsp; &nbsp; &nbsp; <parameter key="use_except_expression" value="false"/></div><div>&nbsp; &nbsp; &nbsp; &nbsp; <parameter key="value_type" value="attribute_value"/></div><div>&nbsp; &nbsp; &nbsp; &nbsp; <parameter key="use_value_type_exception" value="false"/></div><div>&nbsp; &nbsp; &nbsp; &nbsp; <parameter key="except_value_type" value="time"/></div><div>&nbsp; &nbsp; &nbsp; &nbsp; <parameter key="block_type" value="attribute_block"/></div><div>&nbsp; &nbsp; &nbsp; &nbsp; <parameter key="use_block_type_exception" value="false"/></div><div>&nbsp; &nbsp; &nbsp; &nbsp; <parameter key="except_block_type" value="value_matrix_row_start"/></div><div>&nbsp; &nbsp; &nbsp; &nbsp; <parameter key="invert_selection" value="false"/></div><div>&nbsp; &nbsp; &nbsp; &nbsp; <parameter key="include_special_attributes" value="true"/></div><div>&nbsp; &nbsp; &nbsp; &nbsp; <parameter key="aggregation_function" value="count"/></div><div>&nbsp; &nbsp; &nbsp; &nbsp; <parameter key="concatenation_separator" value="|"/></div><div>&nbsp; &nbsp; &nbsp; &nbsp; <parameter key="keep_all" value="true"/></div><div>&nbsp; &nbsp; &nbsp; &nbsp; <parameter key="ignore_missings" value="true"/></div><div>&nbsp; &nbsp; &nbsp; &nbsp; <parameter key="ignore_missing_attributes" value="false"/></div><div>&nbsp; &nbsp; &nbsp; </operator></div><div>&nbsp; &nbsp; &nbsp; <connect from_op="Retrieve job" from_port="output" to_op="Generate Aggregation" to_port="example set input"/></div><div>&nbsp; &nbsp; &nbsp; <connect from_op="Generate Aggregation" from_port="example set output" to_port="result 1"/></div><div>&nbsp; &nbsp; &nbsp; <portSpacing port="source_input 1" spacing="0"/></div><div>&nbsp; &nbsp; &nbsp; <portSpacing port="sink_result 1" spacing="0"/></div><div>&nbsp; &nbsp; &nbsp; <portSpacing port="sink_result 2" spacing="0"/></div><div>&nbsp; &nbsp; &nbsp; <portSpacing port="sink_result 3" spacing="0"/></div><div>&nbsp; &nbsp; </process></div><div>&nbsp; </operator></div><div></process>
</div>
The output I am getting is
RowNo Name Job TotalJob
1 John Painting 1.0
2 Kelly Washing 1.0
3 Diamond Carpentry 1.0
4 Clarice Carpentry 1.0
5 Kennedy Washing 1.0
6 Kevin Painting 1.0
7 Hart Painting 1.0
8 Budsey Painting 1.0
9 David Washing 1.0


Find more posts tagged with