How to delete rows based a list of values
Hi! I have two datasets where the first one is a large set with a list of names and info associated with the names and the second is a smaller set containing only names. I want to delete the rows in the first set which have names not included in the second dataset. I know this is possible with the "filter examples" operator, but I do not want to manually input the filters (there are more than 100). Is there an operator that could read a file and delete the rows accordingly in another file?
Find more posts tagged with
Sort by:
1 - 4 of
41
@Ritika,
Yes, sure :
Yes, sure :
<?xml version="1.0" encoding="UTF-8"?><process version="9.9.002"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.9.002" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="utility:create_exampleset" compatibility="9.9.002" expanded="true" height="68" name="Create ExampleSet" width="90" x="179" y="85"> <parameter key="generator_type" value="comma separated text"/> <parameter key="number_of_examples" value="100"/> <parameter key="use_stepsize" value="false"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"/> <list key="date_series_configuration (interval)"/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="input_csv_text" value="Att1,Att2 michael,1 Lionel,2 Scott,3 Brian,4 Varun,5 Jacob,6 Martin,7 Ingo,8 Kayman,9"/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="utility:create_exampleset" compatibility="9.9.002" expanded="true" height="68" name="Create ExampleSet (2)" width="90" x="179" y="187"> <parameter key="generator_type" value="comma separated text"/> <parameter key="number_of_examples" value="100"/> <parameter key="use_stepsize" value="false"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"/> <list key="date_series_configuration (interval)"/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="input_csv_text" value="Att1 Lionel Ingo Brian"/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="set_role" compatibility="9.9.002" expanded="true" height="82" name="Set Role" width="90" x="447" y="187"> <parameter key="attribute_name" value="Att1"/> <parameter key="target_role" value="id"/> <list key="set_additional_roles"/> </operator> <operator activated="true" class="multiply" compatibility="9.9.002" expanded="true" height="103" name="Multiply (2)" width="90" x="313" y="34"/> <operator activated="true" class="set_role" compatibility="9.9.002" expanded="true" height="82" name="Set Role (3)" width="90" x="648" y="34"> <parameter key="attribute_name" value="Att1"/> <parameter key="target_role" value="id"/> <list key="set_additional_roles"/> </operator> <operator activated="true" class="set_role" compatibility="9.9.002" expanded="true" height="82" name="Set Role (2)" width="90" x="447" y="34"> <parameter key="attribute_name" value="Att1"/> <parameter key="target_role" value="id"/> <list key="set_additional_roles"/> </operator> <operator activated="true" class="set_minus" compatibility="9.9.002" expanded="true" height="82" name="Set Minus" width="90" x="581" y="187"/> <operator activated="true" class="set_minus" compatibility="9.9.002" expanded="true" height="82" name="Set Minus (2)" width="90" x="782" y="136"/> <connect from_op="Create ExampleSet" from_port="output" to_op="Multiply (2)" to_port="input"/> <connect from_op="Create ExampleSet (2)" from_port="output" to_op="Set Role" to_port="example set input"/> <connect from_op="Set Role" from_port="example set output" to_op="Set Minus" to_port="subtrahend"/> <connect from_op="Multiply (2)" from_port="output 1" to_op="Set Role (2)" to_port="example set input"/> <connect from_op="Multiply (2)" from_port="output 2" to_op="Set Role (3)" to_port="example set input"/> <connect from_op="Set Role (3)" from_port="example set output" to_op="Set Minus (2)" to_port="example set input"/> <connect from_op="Set Role (2)" from_port="example set output" to_op="Set Minus" to_port="example set input"/> <connect from_op="Set Minus" from_port="example set output" to_op="Set Minus (2)" to_port="subtrahend"/> <connect from_op="Set Minus (2)" from_port="example set output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
Regards,
Lionel
Lionel
Hi Lionel,
Sorry for the late response, but yes, this worked! Is there also a way to remove instances if the table contains those values? I believe this process works for only times when the table contains those exact values. In other words, say I wanted to keep the name Mike and there are instances of Mike Anderson and Mike Brown; I would want to keep both of them regardless of the last name -- I'm just looking for values that contain Mike.
Sorry for the late response, but yes, this worked! Is there also a way to remove instances if the table contains those values? I believe this process works for only times when the table contains those exact values. In other words, say I wanted to keep the name Mike and there are instances of Mike Anderson and Mike Brown; I would want to keep both of them regardless of the last name -- I'm just looking for values that contain Mike.
You can find in attached file an example of process which performs your task using the Set Minus operator.
You can adapt it to your use case.
Hope this helps,
Regards,
Lionel