find common attributes among two examplesets

Hamedf
Hamedf New Altair Community Member
edited November 2024 in Community Q&A
Good Day!
i have two example sets with many attributes which are not same completely.
want to find common attributes among them and filter example-sets based on common attributes only.
examples (values) are nor important. the data-set structure is only issue.

Regards
Tagged:

Answers

  • kayman
    kayman New Altair Community Member
    Maybe use the superset option? 

    This allows you to merge the two datasets, and then you filter out the ones wich are not common.

    One way to do this would be to generate an identifier for both sets (e.g. generate attribute set1 and set2 for both respectively), the create a superset, filter cases that have both set1 and set2, next remove empty attributes.

    Bit hard to explain without better understanding the actual data but it's a quick and dirty way to achieve this.
  • sgenzer
    sgenzer
    Altair Employee
    yes that works. Or just create an identifier (Generate ID) and do an inner join.
  • tamberge
    tamberge New Altair Community Member
    edited May 2019

    hi kayman, hi sgenzer:

    I have the same issue, however I find it hard to execute the hint you have given.

    So I my case have two examplesets: Both are keyword-document-matrices, so text data converted to structural data in which each attribute defines a keyword, that appears in the set of documents and each example represents a document.

    Now I want to find out which keywords both matrices (Not Examples/Documents) have in common.

    I tried both of the described ways, but none was sufficient.

    Is there anything that I have to keep in mind doing that?

  • sgenzer
    sgenzer
    Altair Employee
    hi @tamberge I think you're going to have to give us some actual data and a process XML to play with on this. It's really hard (at least for me) to understand your situation without it.

    Scott

  • tamberge
    tamberge New Altair Community Member
    edited May 2019

    Hi @sgenzer, sorry.. sure please find the XML code and the data enclosed.

    If there is anything wrong with the uploading format, please let me know!

    <?xml version="1.0" encoding="UTF-8"?><process version="9.3.000-BETA">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.3.000-BETA" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.3.000-BETA" expanded="true" height="68" name="Retrieve PreppedDatabased_TF_00" width="90" x="45" y="238">
            <parameter key="repository_entry" value="//20190923_Outlier Detection/01_Data/012_Single/PreppedDatabased_TF_00"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="9.3.000-BETA" expanded="true" height="82" name="Generate ID (2)" width="90" x="246" y="238">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="47"/>
          </operator>
          <operator activated="true" class="retrieve" compatibility="9.3.000-BETA" expanded="true" height="68" name="Retrieve PreppedDatabase" width="90" x="45" y="34">
            <parameter key="repository_entry" value="//20190503_PatentDataNLP/001_Data/PreppedDatabase"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="9.3.000-BETA" expanded="true" height="82" name="Generate ID" width="90" x="246" y="34">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="true" class="superset" compatibility="9.3.000-BETA" expanded="true" height="82" name="Superset" width="90" x="447" y="34">
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <connect from_op="Retrieve PreppedDatabased_TF_00" from_port="output" to_op="Generate ID (2)" to_port="example set input"/>
          <connect from_op="Generate ID (2)" from_port="example set output" to_op="Superset" to_port="example set 2"/>
          <connect from_op="Retrieve PreppedDatabase" from_port="output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Superset" to_port="example set 1"/>
          <connect from_op="Superset" from_port="superset 1" to_port="result 1"/>
          <connect from_op="Superset" from_port="superset 2" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    



  • sgenzer
    sgenzer
    Altair Employee
    ok there's probably a cleaner way to do this but this works :smile:

    <?xml version="1.0" encoding="UTF-8"?><process version="9.3.000-BETA">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.3.000-BETA" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.3.000-BETA" expanded="true" height="68" name="Retrieve PreppedDatabase (2)" width="90" x="45" y="34">
            <parameter key="repository_entry" value="//LocalRepository/PreppedDatabase"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.3.000-BETA" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
            <parameter key="attribute_filter_type" value="value_type"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="numeric"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="transpose" compatibility="9.3.000-BETA" expanded="true" height="82" name="Transpose" width="90" x="313" y="34"/>
          <operator activated="true" class="select_attributes" compatibility="9.3.000-BETA" expanded="true" height="82" name="Select Attributes (3)" width="90" x="447" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="id"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="retrieve" compatibility="9.3.000-BETA" expanded="true" height="68" name="Retrieve PreppedDatabased_TF_00" width="90" x="45" y="238">
            <parameter key="repository_entry" value="//LocalRepository/PreppedDatabased_TF_00"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.3.000-BETA" expanded="true" height="82" name="Select Attributes (2)" width="90" x="179" y="238">
            <parameter key="attribute_filter_type" value="value_type"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="numeric"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="transpose" compatibility="9.3.000-BETA" expanded="true" height="82" name="Transpose (2)" width="90" x="313" y="238"/>
          <operator activated="true" class="select_attributes" compatibility="9.3.000-BETA" expanded="true" height="82" name="Select Attributes (4)" width="90" x="447" y="238">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="id"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="concurrency:join" compatibility="9.3.000-BETA" expanded="true" height="82" name="Join" width="90" x="648" y="136">
            <parameter key="remove_double_attributes" value="true"/>
            <parameter key="join_type" value="inner"/>
            <parameter key="use_id_attribute_as_key" value="true"/>
            <list key="key_attributes"/>
            <parameter key="keep_both_join_attributes" value="false"/>
          </operator>
          <connect from_op="Retrieve PreppedDatabase (2)" from_port="output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Transpose" to_port="example set input"/>
          <connect from_op="Transpose" from_port="example set output" to_op="Select Attributes (3)" to_port="example set input"/>
          <connect from_op="Select Attributes (3)" from_port="example set output" to_op="Join" to_port="left"/>
          <connect from_op="Retrieve PreppedDatabased_TF_00" from_port="output" to_op="Select Attributes (2)" to_port="example set input"/>
          <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Transpose (2)" to_port="example set input"/>
          <connect from_op="Transpose (2)" from_port="example set output" to_op="Select Attributes (4)" to_port="example set input"/>
          <connect from_op="Select Attributes (4)" from_port="example set output" to_op="Join" to_port="right"/>
          <connect from_op="Join" from_port="join" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>