find common attributes among two examplesets

Hamedf
New Altair Community Member
Good Day!
i have two example sets with many attributes which are not same completely.
want to find common attributes among them and filter example-sets based on common attributes only.
examples (values) are nor important. the data-set structure is only issue.
Regards
i have two example sets with many attributes which are not same completely.
want to find common attributes among them and filter example-sets based on common attributes only.
examples (values) are nor important. the data-set structure is only issue.
Regards
Tagged:
0
Answers
-
Maybe use the superset option?
This allows you to merge the two datasets, and then you filter out the ones wich are not common.
One way to do this would be to generate an identifier for both sets (e.g. generate attribute set1 and set2 for both respectively), the create a superset, filter cases that have both set1 and set2, next remove empty attributes.
Bit hard to explain without better understanding the actual data but it's a quick and dirty way to achieve this.1 -
yes that works. Or just create an identifier (Generate ID) and do an inner join.1
-
hi kayman, hi sgenzer:
I have the same issue, however I find it hard to execute the hint you have given.
So I my case have two examplesets: Both are keyword-document-matrices, so text data converted to structural data in which each attribute defines a keyword, that appears in the set of documents and each example represents a document.
Now I want to find out which keywords both matrices (Not Examples/Documents) have in common.
I tried both of the described ways, but none was sufficient.
Is there anything that I have to keep in mind doing that?
0 -
Hi @sgenzer, sorry.. sure please find the XML code and the data enclosed.
If there is anything wrong with the uploading format, please let me know!
<?xml version="1.0" encoding="UTF-8"?><process version="9.3.000-BETA"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.3.000-BETA" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="9.3.000-BETA" expanded="true" height="68" name="Retrieve PreppedDatabased_TF_00" width="90" x="45" y="238"> <parameter key="repository_entry" value="//20190923_Outlier Detection/01_Data/012_Single/PreppedDatabased_TF_00"/> </operator> <operator activated="true" class="generate_id" compatibility="9.3.000-BETA" expanded="true" height="82" name="Generate ID (2)" width="90" x="246" y="238"> <parameter key="create_nominal_ids" value="false"/> <parameter key="offset" value="47"/> </operator> <operator activated="true" class="retrieve" compatibility="9.3.000-BETA" expanded="true" height="68" name="Retrieve PreppedDatabase" width="90" x="45" y="34"> <parameter key="repository_entry" value="//20190503_PatentDataNLP/001_Data/PreppedDatabase"/> </operator> <operator activated="true" class="generate_id" compatibility="9.3.000-BETA" expanded="true" height="82" name="Generate ID" width="90" x="246" y="34"> <parameter key="create_nominal_ids" value="false"/> <parameter key="offset" value="0"/> </operator> <operator activated="true" class="superset" compatibility="9.3.000-BETA" expanded="true" height="82" name="Superset" width="90" x="447" y="34"> <parameter key="include_special_attributes" value="false"/> </operator> <connect from_op="Retrieve PreppedDatabased_TF_00" from_port="output" to_op="Generate ID (2)" to_port="example set input"/> <connect from_op="Generate ID (2)" from_port="example set output" to_op="Superset" to_port="example set 2"/> <connect from_op="Retrieve PreppedDatabase" from_port="output" to_op="Generate ID" to_port="example set input"/> <connect from_op="Generate ID" from_port="example set output" to_op="Superset" to_port="example set 1"/> <connect from_op="Superset" from_port="superset 1" to_port="result 1"/> <connect from_op="Superset" from_port="superset 2" to_port="result 2"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <portSpacing port="sink_result 3" spacing="0"/> </process> </operator> </process>
1 -
ok there's probably a cleaner way to do this but this works
<?xml version="1.0" encoding="UTF-8"?><process version="9.3.000-BETA"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.3.000-BETA" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="9.3.000-BETA" expanded="true" height="68" name="Retrieve PreppedDatabase (2)" width="90" x="45" y="34"> <parameter key="repository_entry" value="//LocalRepository/PreppedDatabase"/> </operator> <operator activated="true" class="select_attributes" compatibility="9.3.000-BETA" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34"> <parameter key="attribute_filter_type" value="value_type"/> <parameter key="attribute" value=""/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="numeric"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="true"/> </operator> <operator activated="true" class="transpose" compatibility="9.3.000-BETA" expanded="true" height="82" name="Transpose" width="90" x="313" y="34"/> <operator activated="true" class="select_attributes" compatibility="9.3.000-BETA" expanded="true" height="82" name="Select Attributes (3)" width="90" x="447" y="34"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="id"/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="true"/> </operator> <operator activated="true" class="retrieve" compatibility="9.3.000-BETA" expanded="true" height="68" name="Retrieve PreppedDatabased_TF_00" width="90" x="45" y="238"> <parameter key="repository_entry" value="//LocalRepository/PreppedDatabased_TF_00"/> </operator> <operator activated="true" class="select_attributes" compatibility="9.3.000-BETA" expanded="true" height="82" name="Select Attributes (2)" width="90" x="179" y="238"> <parameter key="attribute_filter_type" value="value_type"/> <parameter key="attribute" value=""/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="numeric"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="true"/> </operator> <operator activated="true" class="transpose" compatibility="9.3.000-BETA" expanded="true" height="82" name="Transpose (2)" width="90" x="313" y="238"/> <operator activated="true" class="select_attributes" compatibility="9.3.000-BETA" expanded="true" height="82" name="Select Attributes (4)" width="90" x="447" y="238"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="id"/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="true"/> </operator> <operator activated="true" class="concurrency:join" compatibility="9.3.000-BETA" expanded="true" height="82" name="Join" width="90" x="648" y="136"> <parameter key="remove_double_attributes" value="true"/> <parameter key="join_type" value="inner"/> <parameter key="use_id_attribute_as_key" value="true"/> <list key="key_attributes"/> <parameter key="keep_both_join_attributes" value="false"/> </operator> <connect from_op="Retrieve PreppedDatabase (2)" from_port="output" to_op="Select Attributes" to_port="example set input"/> <connect from_op="Select Attributes" from_port="example set output" to_op="Transpose" to_port="example set input"/> <connect from_op="Transpose" from_port="example set output" to_op="Select Attributes (3)" to_port="example set input"/> <connect from_op="Select Attributes (3)" from_port="example set output" to_op="Join" to_port="left"/> <connect from_op="Retrieve PreppedDatabased_TF_00" from_port="output" to_op="Select Attributes (2)" to_port="example set input"/> <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Transpose (2)" to_port="example set input"/> <connect from_op="Transpose (2)" from_port="example set output" to_op="Select Attributes (4)" to_port="example set input"/> <connect from_op="Select Attributes (4)" from_port="example set output" to_op="Join" to_port="right"/> <connect from_op="Join" from_port="join" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
0