Remove correlated features from training set and apply the same features to test set

Andy3
Andy3 New Altair Community Member
edited November 2024 in Community Q&A
Hello all,

I just wondering how you achieve to remove pairwise correlated features from your training set (using the Remove Correlated Attributes operator) and apply the same features to your test set? If I should compare this operation to something I think about the "Apply feature set" (as exists for the features selection operator) or somewhat OHE and the Preprocessing model output. See screenshot below of the process. I have normally these two training and test preprocessing operations in two different processes.

Thanks for your help.

Welcome!

It looks like you're new here. Sign in or register to get started.

Best Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓
    Hi @Andy3 ,
    if you need to do it, you can use Data to Weights for it. Attached is an example.

    BR,
    Martin

    Spoiler
    <?xml version="1.0" encoding="UTF-8"?><process version="9.6.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.6.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.6.000" expanded="true" height="68" name="Retrieve Sonar" width="90" x="45" y="34">
            <parameter key="repository_entry" value="//Samples/data/Sonar"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.6.000" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="attribute_1"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="data_to_weights" compatibility="9.6.000" expanded="true" height="82" name="Data to Weights" width="90" x="313" y="34">
            <parameter key="normalize_weights" value="false"/>
            <parameter key="sort_weights" value="true"/>
            <parameter key="sort_direction" value="ascending"/>
          </operator>
          <operator activated="true" class="retrieve" compatibility="9.6.000" expanded="true" height="68" name="Retrieve Sonar (2)" width="90" x="45" y="238">
            <parameter key="repository_entry" value="//Samples/data/Sonar"/>
          </operator>
          <operator activated="true" class="select_by_weights" compatibility="9.6.000" expanded="true" height="103" name="Select by Weights" width="90" x="514" y="136">
            <parameter key="weight_relation" value="greater equals"/>
            <parameter key="weight" value="1.0"/>
            <parameter key="k" value="10"/>
            <parameter key="p" value="0.5"/>
            <parameter key="deselect_unknown" value="true"/>
            <parameter key="use_absolute_weights" value="true"/>
          </operator>
          <connect from_op="Retrieve Sonar" from_port="output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Data to Weights" to_port="example set"/>
          <connect from_op="Data to Weights" from_port="weights" to_op="Select by Weights" to_port="weights"/>
          <connect from_op="Retrieve Sonar (2)" from_port="output" to_op="Select by Weights" to_port="example set input"/>
          <connect from_op="Select by Weights" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>


Answers

  • Andy3
    Andy3 New Altair Community Member
    Yeah, fair enough though I was hoping the where an operator(s) like this so I could have consistency through the various data sets (and in my mind :smile: ). I leave it there.

    Thanks for the help.
  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓
    Hi @Andy3 ,
    if you need to do it, you can use Data to Weights for it. Attached is an example.

    BR,
    Martin

    Spoiler
    <?xml version="1.0" encoding="UTF-8"?><process version="9.6.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.6.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.6.000" expanded="true" height="68" name="Retrieve Sonar" width="90" x="45" y="34">
            <parameter key="repository_entry" value="//Samples/data/Sonar"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.6.000" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="attribute_1"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="data_to_weights" compatibility="9.6.000" expanded="true" height="82" name="Data to Weights" width="90" x="313" y="34">
            <parameter key="normalize_weights" value="false"/>
            <parameter key="sort_weights" value="true"/>
            <parameter key="sort_direction" value="ascending"/>
          </operator>
          <operator activated="true" class="retrieve" compatibility="9.6.000" expanded="true" height="68" name="Retrieve Sonar (2)" width="90" x="45" y="238">
            <parameter key="repository_entry" value="//Samples/data/Sonar"/>
          </operator>
          <operator activated="true" class="select_by_weights" compatibility="9.6.000" expanded="true" height="103" name="Select by Weights" width="90" x="514" y="136">
            <parameter key="weight_relation" value="greater equals"/>
            <parameter key="weight" value="1.0"/>
            <parameter key="k" value="10"/>
            <parameter key="p" value="0.5"/>
            <parameter key="deselect_unknown" value="true"/>
            <parameter key="use_absolute_weights" value="true"/>
          </operator>
          <connect from_op="Retrieve Sonar" from_port="output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Data to Weights" to_port="example set"/>
          <connect from_op="Data to Weights" from_port="weights" to_op="Select by Weights" to_port="weights"/>
          <connect from_op="Retrieve Sonar (2)" from_port="output" to_op="Select by Weights" to_port="example set input"/>
          <connect from_op="Select by Weights" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>


Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.