Find the lowest value of an attribute in relation to the other

onep
onep New Altair Community Member
edited November 5 in Community Q&A

Hello.

I'm just getting started to learn Radpidminer and would like some help to my first project.

I've managed to import my data into RM and made a predictive process on some training data and can predict my unlabeled data based on the trained data.

Now, I've trained my model with the raw training data, but I had realized that I need to cleanse the data first. See I only want my model to be trained with the data with the lowest possible value of the column B.

 

So here is my setup;

Column A - Labeled data

Column B - Data where i only want the lowest value

Coumn C - Data

 

Example:

First row

Column A: 100

Column B: 20

Column C: 50

Second row

Column A: 100

Column B: 19

Column C: 50

 

Now, since A+C is matching I would only want my model to be trained with the second row, since the first row has a higher value of column B.

 

Is this achiveable in Rapidminer?

I do not hope that my explanation is too confusing :)

 

 

Tagged:

Best Answers

  • onep
    onep New Altair Community Member
    Answer ✓

    I only want to maintain the lowest value if the other columns are the same. If I invent an example with a car, it might make more sense :).

    I have the following data from the car in my database;

    Speed, gear selection, fuel consumption.

     

    I've got alot of rows with data from the car, for example;

    100 km / h, 5th gear, 0.3 l / km

    100 km / h, 4th gear, 0.35 km / l

     

    Gear selection is my label. In this case, I keep only the row with the 5th gear because it gives me the lowest fuel consumption.

     

    But if I have a row with the following data;

    95 km / h, 4th gear, 0.35 km / l

    I would still keep this row, becuase now speed has changed.

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓

    Mathias,

     

    have a look at the attached process, i think this is what you want to do.

     

    Best,

    Martin

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.2.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.2.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="subprocess" compatibility="7.2.001" expanded="true" height="82" name="Subprocess" width="90" x="45" y="85">
    <process expanded="true">
    <operator activated="true" class="generate_data_user_specification" compatibility="7.2.001" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="45" y="34">
    <list key="attribute_values">
    <parameter key="speed" value="100"/>
    <parameter key="gear" value="5"/>
    <parameter key="fuel consumption" value="0.3"/>
    </list>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="generate_data_user_specification" compatibility="7.2.001" expanded="true" height="68" name="Generate Data by User Specification (2)" width="90" x="45" y="136">
    <list key="attribute_values">
    <parameter key="speed" value="100"/>
    <parameter key="gear" value="4"/>
    <parameter key="fuel consumption" value="0.35"/>
    </list>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="generate_data_user_specification" compatibility="7.2.001" expanded="true" height="68" name="Generate Data by User Specification (3)" width="90" x="45" y="238">
    <list key="attribute_values">
    <parameter key="speed" value="95"/>
    <parameter key="gear" value="4"/>
    <parameter key="fuel consumption" value="0.35"/>
    </list>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="append" compatibility="7.2.001" expanded="true" height="124" name="Append" width="90" x="179" y="85"/>
    <connect from_op="Generate Data by User Specification" from_port="output" to_op="Append" to_port="example set 1"/>
    <connect from_op="Generate Data by User Specification (2)" from_port="output" to_op="Append" to_port="example set 2"/>
    <connect from_op="Generate Data by User Specification (3)" from_port="output" to_op="Append" to_port="example set 3"/>
    <connect from_op="Append" from_port="merged set" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Get sample data</description>
    </operator>
    <operator activated="true" class="multiply" compatibility="7.2.001" expanded="true" height="103" name="Multiply" width="90" x="179" y="136"/>
    <operator activated="true" class="aggregate" compatibility="7.2.001" expanded="true" height="82" name="Aggregate" width="90" x="380" y="85">
    <list key="aggregation_attributes">
    <parameter key="fuel consumption" value="minimum"/>
    </list>
    <parameter key="group_by_attributes" value="speed"/>
    </operator>
    <operator activated="true" class="join" compatibility="7.2.001" expanded="true" height="82" name="Join" width="90" x="581" y="136">
    <parameter key="use_id_attribute_as_key" value="false"/>
    <list key="key_attributes">
    <parameter key="minimum(fuel consumption)" value="fuel consumption"/>
    <parameter key="speed" value="speed"/>
    </list>
    </operator>
    <connect from_op="Subprocess" from_port="out 1" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Join" to_port="right"/>
    <connect from_op="Aggregate" from_port="example set output" to_op="Join" to_port="left"/>
    <connect from_op="Join" from_port="join" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

  • Telcontar120
    Telcontar120 New Altair Community Member
    Answer ✓

    I think there may be an even easier solution here, using only one aggregate operator.  Just group by the two attributes that define a unique case and set the third attribute to capture the minimum value. Take a look at the attached sample process and dataset and see if it works.

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.2.002">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.2.002" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="7.2.002" expanded="true" height="68" name="Read CSV" width="90" x="45" y="85">
    <parameter key="csv_file" value="C:\Users\brian\Downloads\dedupe test.csv"/>
    <parameter key="column_separators" value=","/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Att_A.true.integer.attribute"/>
    <parameter key="1" value="Att_B.true.integer.attribute"/>
    <parameter key="2" value="Att_C.true.integer.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="aggregate" compatibility="7.2.002" expanded="true" height="82" name="Aggregate" width="90" x="179" y="85">
    <list key="aggregation_attributes">
    <parameter key="Att_C" value="minimum"/>
    <parameter key="Att_C" value="count"/>
    </list>
    <parameter key="group_by_attributes" value="Att_A|Att_B"/>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Aggregate" from_port="example set output" to_port="result 1"/>
    <connect from_op="Aggregate" from_port="original" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee

    Hi Mathias,

     

    welcome to the community!

     

    For sure it is doable, but i am not totally sure what you want do.

     

    For your given example, i guess you would first calculate the min per example (horizontal aggregation)

    so you would get

    id    min(A,B,C)

    1     20

    2     19

     

    And then take the min of the mins (vertical aggregation), which yields to take the 2?

     

    If yes, its a combination of: Generate Aggregation to get min(A,B,C), Aggregation to get min of min(A,B,C) and i think two joins.

     

    Best,

    Martin

  • onep
    onep New Altair Community Member
    Answer ✓

    I only want to maintain the lowest value if the other columns are the same. If I invent an example with a car, it might make more sense :).

    I have the following data from the car in my database;

    Speed, gear selection, fuel consumption.

     

    I've got alot of rows with data from the car, for example;

    100 km / h, 5th gear, 0.3 l / km

    100 km / h, 4th gear, 0.35 km / l

     

    Gear selection is my label. In this case, I keep only the row with the 5th gear because it gives me the lowest fuel consumption.

     

    But if I have a row with the following data;

    95 km / h, 4th gear, 0.35 km / l

    I would still keep this row, becuase now speed has changed.

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓

    Mathias,

     

    have a look at the attached process, i think this is what you want to do.

     

    Best,

    Martin

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.2.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.2.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="subprocess" compatibility="7.2.001" expanded="true" height="82" name="Subprocess" width="90" x="45" y="85">
    <process expanded="true">
    <operator activated="true" class="generate_data_user_specification" compatibility="7.2.001" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="45" y="34">
    <list key="attribute_values">
    <parameter key="speed" value="100"/>
    <parameter key="gear" value="5"/>
    <parameter key="fuel consumption" value="0.3"/>
    </list>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="generate_data_user_specification" compatibility="7.2.001" expanded="true" height="68" name="Generate Data by User Specification (2)" width="90" x="45" y="136">
    <list key="attribute_values">
    <parameter key="speed" value="100"/>
    <parameter key="gear" value="4"/>
    <parameter key="fuel consumption" value="0.35"/>
    </list>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="generate_data_user_specification" compatibility="7.2.001" expanded="true" height="68" name="Generate Data by User Specification (3)" width="90" x="45" y="238">
    <list key="attribute_values">
    <parameter key="speed" value="95"/>
    <parameter key="gear" value="4"/>
    <parameter key="fuel consumption" value="0.35"/>
    </list>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="append" compatibility="7.2.001" expanded="true" height="124" name="Append" width="90" x="179" y="85"/>
    <connect from_op="Generate Data by User Specification" from_port="output" to_op="Append" to_port="example set 1"/>
    <connect from_op="Generate Data by User Specification (2)" from_port="output" to_op="Append" to_port="example set 2"/>
    <connect from_op="Generate Data by User Specification (3)" from_port="output" to_op="Append" to_port="example set 3"/>
    <connect from_op="Append" from_port="merged set" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Get sample data</description>
    </operator>
    <operator activated="true" class="multiply" compatibility="7.2.001" expanded="true" height="103" name="Multiply" width="90" x="179" y="136"/>
    <operator activated="true" class="aggregate" compatibility="7.2.001" expanded="true" height="82" name="Aggregate" width="90" x="380" y="85">
    <list key="aggregation_attributes">
    <parameter key="fuel consumption" value="minimum"/>
    </list>
    <parameter key="group_by_attributes" value="speed"/>
    </operator>
    <operator activated="true" class="join" compatibility="7.2.001" expanded="true" height="82" name="Join" width="90" x="581" y="136">
    <parameter key="use_id_attribute_as_key" value="false"/>
    <list key="key_attributes">
    <parameter key="minimum(fuel consumption)" value="fuel consumption"/>
    <parameter key="speed" value="speed"/>
    </list>
    </operator>
    <connect from_op="Subprocess" from_port="out 1" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Join" to_port="right"/>
    <connect from_op="Aggregate" from_port="example set output" to_op="Join" to_port="left"/>
    <connect from_op="Join" from_port="join" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

  • Telcontar120
    Telcontar120 New Altair Community Member
    Answer ✓

    I think there may be an even easier solution here, using only one aggregate operator.  Just group by the two attributes that define a unique case and set the third attribute to capture the minimum value. Take a look at the attached sample process and dataset and see if it works.

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.2.002">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.2.002" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="7.2.002" expanded="true" height="68" name="Read CSV" width="90" x="45" y="85">
    <parameter key="csv_file" value="C:\Users\brian\Downloads\dedupe test.csv"/>
    <parameter key="column_separators" value=","/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Att_A.true.integer.attribute"/>
    <parameter key="1" value="Att_B.true.integer.attribute"/>
    <parameter key="2" value="Att_C.true.integer.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="aggregate" compatibility="7.2.002" expanded="true" height="82" name="Aggregate" width="90" x="179" y="85">
    <list key="aggregation_attributes">
    <parameter key="Att_C" value="minimum"/>
    <parameter key="Att_C" value="count"/>
    </list>
    <parameter key="group_by_attributes" value="Att_A|Att_B"/>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Aggregate" from_port="example set output" to_port="result 1"/>
    <connect from_op="Aggregate" from_port="original" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

  • onep
    onep New Altair Community Member

    Thank you for your help - just what I was looking for :)