"Filter text on regex."
I want to find all text snippets containing 1 or several words via regex. if I write select Filter Examples, and set it to "Expression" and provide it with: finds(Text, "(?i)\blootbox|micro\b") it doesn't work, although it is syntactically correct.
If I remove |micro, it only returns all snippts that contain lootbox - why does it not return an example with one of them? If I use RapidMiner's regex checker on some dummy data it works with the match on both of them, just not with "Filter Examples".
Kindly help!
Answers
-
Hi,Try to use the following expression: finds(Text, ".*lootbox.*|.*micro.*")This will match all texts which contain either one of those strings surrounded by arbitrary other stuff. The process below shows a simple example.Hope this helps,Ingo
<?xml version="1.0" encoding="UTF-8"?><process version="9.2.000"><br> <context><br> <input/><br> <output/><br> <macros/><br> </context><br> <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process"><br> <parameter key="logverbosity" value="init"/><br> <parameter key="random_seed" value="2001"/><br> <parameter key="send_mail" value="never"/><br> <parameter key="notification_email" value=""/><br> <parameter key="process_duration_for_mail" value="30"/><br> <parameter key="encoding" value="UTF-8"/><br> <process expanded="true"><br> <operator activated="true" class="utility:create_exampleset" compatibility="9.2.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34"><br> <parameter key="generator_type" value="comma separated text"/><br> <parameter key="number_of_examples" value="100"/><br> <parameter key="use_stepsize" value="false"/><br> <list key="function_descriptions"/><br> <parameter key="add_id_attribute" value="false"/><br> <list key="numeric_series_configuration"/><br> <list key="date_series_configuration"/><br> <list key="date_series_configuration (interval)"/><br> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/><br> <parameter key="time_zone" value="SYSTEM"/><br> <parameter key="input_csv_text" value="Text This is a text about lootboxes This is a text about micro transactions This is a text about lootboxes and micro transactions And this is a text talking about other things"/><br> <parameter key="column_separator" value=","/><br> <parameter key="parse_all_as_nominal" value="false"/><br> <parameter key="decimal_point_character" value="."/><br> <parameter key="trim_attribute_names" value="true"/><br> </operator><br> <operator activated="true" class="filter_examples" compatibility="9.2.000" expanded="true" height="103" name="Filter Examples" width="90" x="179" y="34"><br> <parameter key="parameter_expression" value="finds([Text],".*lootbox.*|.*micro.*")"/><br> <parameter key="condition_class" value="expression"/><br> <parameter key="invert_filter" value="false"/><br> <list key="filters_list"/><br> <parameter key="filters_logic_and" value="true"/><br> <parameter key="filters_check_metadata" value="true"/><br> </operator><br> <connect from_op="Create ExampleSet" from_port="output" to_op="Filter Examples" to_port="example set input"/><br> <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/><br> <portSpacing port="source_input 1" spacing="0"/><br> <portSpacing port="sink_result 1" spacing="0"/><br> <portSpacing port="sink_result 2" spacing="0"/><br> </process><br> </operator><br></process>
2 -
Appreciate the input, but sadly this regex matches on anything, that contains those letters. Say I have the word microsoft - your regex would trigger that, but I'm only looking for an exact match :-)0
-
I'm only looking for an exact match :-)
Well, this expression actually IS an exact match ;-)
So I assume you would like to only match if there is a non-word character before and after? Is that what you mean? In this case, the correct expression is finds([Text],".*\\W+lootbox\\W+.*|.*\\W+micro\\W+.*") - process below.
Please note however that in this case you would also no longer find plurals easily, so for example "lootboxes" would not trigger this any longer.
Cheers,
Ingo<?xml version="1.0" encoding="UTF-8"?><process version="9.2.000"><br> <context><br> <input/><br> <output/><br> <macros/><br> </context><br> <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process"><br> <parameter key="logverbosity" value="init"/><br> <parameter key="random_seed" value="2001"/><br> <parameter key="send_mail" value="never"/><br> <parameter key="notification_email" value=""/><br> <parameter key="process_duration_for_mail" value="30"/><br> <parameter key="encoding" value="UTF-8"/><br> <process expanded="true"><br> <operator activated="true" class="utility:create_exampleset" compatibility="9.2.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34"><br> <parameter key="generator_type" value="comma separated text"/><br> <parameter key="number_of_examples" value="100"/><br> <parameter key="use_stepsize" value="false"/><br> <list key="function_descriptions"/><br> <parameter key="add_id_attribute" value="false"/><br> <list key="numeric_series_configuration"/><br> <list key="date_series_configuration"/><br> <list key="date_series_configuration (interval)"/><br> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/><br> <parameter key="time_zone" value="SYSTEM"/><br> <parameter key="input_csv_text" value="Text This is a text about lootboxes This is a text about micro transactions This is a text about lootboxes and micro transactions And this is a text talking about other things"/><br> <parameter key="column_separator" value=","/><br> <parameter key="parse_all_as_nominal" value="false"/><br> <parameter key="decimal_point_character" value="."/><br> <parameter key="trim_attribute_names" value="true"/><br> </operator><br> <operator activated="true" class="filter_examples" compatibility="9.2.000" expanded="true" height="103" name="Filter Examples" width="90" x="179" y="34"><br> <parameter key="parameter_expression" value="finds([Text],".*\\W+lootbox\\W+.*|.*\\W+micro\\W+.*")"/><br> <parameter key="condition_class" value="expression"/><br> <parameter key="invert_filter" value="false"/><br> <list key="filters_list"/><br> <parameter key="filters_logic_and" value="true"/><br> <parameter key="filters_check_metadata" value="true"/><br> </operator><br> <connect from_op="Create ExampleSet" from_port="output" to_op="Filter Examples" to_port="example set input"/><br> <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/><br> <portSpacing port="source_input 1" spacing="0"/><br> <portSpacing port="sink_result 1" spacing="0"/><br> <portSpacing port="sink_result 2" spacing="0"/><br> </process><br> </operator><br></process>
1