How to delete rows based a list of values

Ritika
New Altair Community Member
Hi! I have two datasets where the first one is a large set with a list of names and info associated with the names and the second is a smaller set containing only names. I want to delete the rows in the first set which have names not included in the second dataset. I know this is possible with the "filter examples" operator, but I do not want to manually input the filters (there are more than 100). Is there an operator that could read a file and delete the rows accordingly in another file?
Tagged:
0
Answers
-
Hi @Ritika,
You can find in attached file an example of process which performs your task using the Set Minus operator.
You can adapt it to your use case.
Hope this helps,
Regards,
Lionel
0 -
Hello Lionel,
I get the same malformed error. Sorry about this. Could you send the code?
0 -
@Ritika,
Yes, sure :<?xml version="1.0" encoding="UTF-8"?><process version="9.9.002"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.9.002" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="utility:create_exampleset" compatibility="9.9.002" expanded="true" height="68" name="Create ExampleSet" width="90" x="179" y="85"> <parameter key="generator_type" value="comma separated text"/> <parameter key="number_of_examples" value="100"/> <parameter key="use_stepsize" value="false"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"/> <list key="date_series_configuration (interval)"/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="input_csv_text" value="Att1,Att2 michael,1 Lionel,2 Scott,3 Brian,4 Varun,5 Jacob,6 Martin,7 Ingo,8 Kayman,9"/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="utility:create_exampleset" compatibility="9.9.002" expanded="true" height="68" name="Create ExampleSet (2)" width="90" x="179" y="187"> <parameter key="generator_type" value="comma separated text"/> <parameter key="number_of_examples" value="100"/> <parameter key="use_stepsize" value="false"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"/> <list key="date_series_configuration (interval)"/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="input_csv_text" value="Att1 Lionel Ingo Brian"/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="set_role" compatibility="9.9.002" expanded="true" height="82" name="Set Role" width="90" x="447" y="187"> <parameter key="attribute_name" value="Att1"/> <parameter key="target_role" value="id"/> <list key="set_additional_roles"/> </operator> <operator activated="true" class="multiply" compatibility="9.9.002" expanded="true" height="103" name="Multiply (2)" width="90" x="313" y="34"/> <operator activated="true" class="set_role" compatibility="9.9.002" expanded="true" height="82" name="Set Role (3)" width="90" x="648" y="34"> <parameter key="attribute_name" value="Att1"/> <parameter key="target_role" value="id"/> <list key="set_additional_roles"/> </operator> <operator activated="true" class="set_role" compatibility="9.9.002" expanded="true" height="82" name="Set Role (2)" width="90" x="447" y="34"> <parameter key="attribute_name" value="Att1"/> <parameter key="target_role" value="id"/> <list key="set_additional_roles"/> </operator> <operator activated="true" class="set_minus" compatibility="9.9.002" expanded="true" height="82" name="Set Minus" width="90" x="581" y="187"/> <operator activated="true" class="set_minus" compatibility="9.9.002" expanded="true" height="82" name="Set Minus (2)" width="90" x="782" y="136"/> <connect from_op="Create ExampleSet" from_port="output" to_op="Multiply (2)" to_port="input"/> <connect from_op="Create ExampleSet (2)" from_port="output" to_op="Set Role" to_port="example set input"/> <connect from_op="Set Role" from_port="example set output" to_op="Set Minus" to_port="subtrahend"/> <connect from_op="Multiply (2)" from_port="output 1" to_op="Set Role (2)" to_port="example set input"/> <connect from_op="Multiply (2)" from_port="output 2" to_op="Set Role (3)" to_port="example set input"/> <connect from_op="Set Role (3)" from_port="example set output" to_op="Set Minus (2)" to_port="example set input"/> <connect from_op="Set Role (2)" from_port="example set output" to_op="Set Minus" to_port="example set input"/> <connect from_op="Set Minus" from_port="example set output" to_op="Set Minus (2)" to_port="subtrahend"/> <connect from_op="Set Minus (2)" from_port="example set output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
Regards,
Lionel
0 -
Hi Lionel,
Sorry for the late response, but yes, this worked! Is there also a way to remove instances if the table contains those values? I believe this process works for only times when the table contains those exact values. In other words, say I wanted to keep the name Mike and there are instances of Mike Anderson and Mike Brown; I would want to keep both of them regardless of the last name -- I'm just looking for values that contain Mike.0