How can we check whether a word is present in the list or not?
Anusha
New Altair Community Member
Hi All!
I have a dataset, which is having 3 columns. The first 2 columns are having a list of words and 3r column has a single word in each row. I need to check that word present in the 3rd column whether it's present in 1st column list or 2nd column list.
Source Data:
list1 list2 ch
shape,size,type,endi toldis,umbr,oilv,poll type
shape,size,type,endi toldis,umbr,oilv,poll oilv
shape,size,type,endi toldis,umbr,oilv,poll umbr
Desired output:
list1 list2 ch flag_1(list1) flag_2(list2)
shape,size,type,endi toldis,umbr,oilv,poll type 1 0
shape,size,type,endi toldis,umbr,oilv,poll oilv 0 1
shape,size,type,endi toldis,umbr,oilv,poll umbr 0 1
as "type" is present in list1 flag_1 should be "1" and flag_2 should be "0"
"oilv" and "umbr" are present in list2 column so flag_2 should be "1" for them.
I have tried array_contains, IN, NOT IN and loop values but unable to get the required answer. can anyone help me in resolving this?
Thanks in Advance!
I have a dataset, which is having 3 columns. The first 2 columns are having a list of words and 3r column has a single word in each row. I need to check that word present in the 3rd column whether it's present in 1st column list or 2nd column list.
Source Data:
list1 list2 ch
shape,size,type,endi toldis,umbr,oilv,poll type
shape,size,type,endi toldis,umbr,oilv,poll oilv
shape,size,type,endi toldis,umbr,oilv,poll umbr
Desired output:
list1 list2 ch flag_1(list1) flag_2(list2)
shape,size,type,endi toldis,umbr,oilv,poll type 1 0
shape,size,type,endi toldis,umbr,oilv,poll oilv 0 1
shape,size,type,endi toldis,umbr,oilv,poll umbr 0 1
as "type" is present in list1 flag_1 should be "1" and flag_2 should be "0"
"oilv" and "umbr" are present in list2 column so flag_2 should be "1" for them.
I have tried array_contains, IN, NOT IN and loop values but unable to get the required answer. can anyone help me in resolving this?
Thanks in Advance!
Tagged:
0
Best Answer
-
Hi,
it is easy with Generate Attributes. I tried two different approaches:
if(contains(list1, ch), 1, 0)
if(matches(list2, "(^|.*,)" + ch + "($|,.*)"), 1, 0)
The solution with "contains()" is simpler but not exactly foolprof: it could also match substrings.
The regular expression search with matches() checks for "either the start of the string or a text followed by a comma", the search string, and "either the end of the string or a text after a comma".
Here's an example process:<?xml version="1.0" encoding="UTF-8"?><process version="9.9.002"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.9.002" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="-1"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="utility:create_exampleset" compatibility="9.9.002" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34"> <parameter key="generator_type" value="comma separated text"/> <parameter key="number_of_examples" value="100"/> <parameter key="use_stepsize" value="false"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"/> <list key="date_series_configuration (interval)"/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="input_csv_text" value="list1;list2;ch shape,size,type,endi;toldis,umbr,oilv,poll;type shape,size,type,endi;toldis,umbr,oilv,poll;oilv shape,size,type,endi;toldis,umbr,oilv,poll;umbr"/> <parameter key="column_separator" value=";"/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="generate_attributes" compatibility="9.9.002" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="34"> <list key="function_descriptions"> <parameter key="flag_1(list1)" value="if(contains(list1, ch), 1, 0)"/> <parameter key="flag_2(list2)" value="if(matches(list2, "(^|.*,)" + ch + "($|,.*)"), 1, 0)"/> </list> <parameter key="keep_all" value="true"/> </operator> <connect from_op="Create ExampleSet" from_port="output" to_op="Generate Attributes" to_port="example set input"/> <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
Regards,
Balázs0
Answers
-
Hi,
it is easy with Generate Attributes. I tried two different approaches:
if(contains(list1, ch), 1, 0)
if(matches(list2, "(^|.*,)" + ch + "($|,.*)"), 1, 0)
The solution with "contains()" is simpler but not exactly foolprof: it could also match substrings.
The regular expression search with matches() checks for "either the start of the string or a text followed by a comma", the search string, and "either the end of the string or a text after a comma".
Here's an example process:<?xml version="1.0" encoding="UTF-8"?><process version="9.9.002"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.9.002" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="-1"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="utility:create_exampleset" compatibility="9.9.002" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34"> <parameter key="generator_type" value="comma separated text"/> <parameter key="number_of_examples" value="100"/> <parameter key="use_stepsize" value="false"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"/> <list key="date_series_configuration (interval)"/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="input_csv_text" value="list1;list2;ch shape,size,type,endi;toldis,umbr,oilv,poll;type shape,size,type,endi;toldis,umbr,oilv,poll;oilv shape,size,type,endi;toldis,umbr,oilv,poll;umbr"/> <parameter key="column_separator" value=";"/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="generate_attributes" compatibility="9.9.002" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="34"> <list key="function_descriptions"> <parameter key="flag_1(list1)" value="if(contains(list1, ch), 1, 0)"/> <parameter key="flag_2(list2)" value="if(matches(list2, "(^|.*,)" + ch + "($|,.*)"), 1, 0)"/> </list> <parameter key="keep_all" value="true"/> </operator> <connect from_op="Create ExampleSet" from_port="output" to_op="Generate Attributes" to_port="example set input"/> <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
Regards,
Balázs0