A program to recognize and reward our most engaged community members
Hi,
Do you mean just getting rid of the symbols "@ and #" or do you also want to remove what is following after, e.g. "@ingomierswa" and "#datascience" should be completely removed?
Both is easily possible with the operator "Replace" and a simple regular expression. Below is a small sample process showing you how this is done.
Hope this helps,
Ingo
<?xml version="1.0" encoding="UTF-8"?><process version="7.3.001"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process"> <process expanded="true"> <operator activated="true" class="generate_data_user_specification" compatibility="7.3.001" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="112" y="34"> <list key="attribute_values"> <parameter key="sample_tweet" value=""This is just a sample tweet from @ingomierswa on #datascience - end of tweet.""/> </list> <list key="set_additional_roles"/> </operator> <operator activated="true" class="multiply" compatibility="7.3.001" expanded="true" height="103" name="Multiply" width="90" x="246" y="34"/> <operator activated="true" class="replace" compatibility="7.3.001" expanded="true" height="82" name="Only remove symbols" width="90" x="380" y="34"> <parameter key="replace_what" value="@|#"/> </operator> <operator activated="true" class="replace" compatibility="7.3.001" expanded="true" height="82" name="Complete entities removed" width="90" x="380" y="136"> <parameter key="replace_what" value="@[a-zA-Z]*|#[a-zA-Z]*"/> </operator> <connect from_op="Generate Data by User Specification" from_port="output" to_op="Multiply" to_port="input"/> <connect from_op="Multiply" from_port="output 1" to_op="Only remove symbols" to_port="example set input"/> <connect from_op="Multiply" from_port="output 2" to_op="Complete entities removed" to_port="example set input"/> <connect from_op="Only remove symbols" from_port="example set output" to_port="result 1"/> <connect from_op="Complete entities removed" from_port="example set output" to_port="result 2"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="84"/> <portSpacing port="sink_result 3" spacing="0"/> </process> </operator></process>
Extend your regex a bit like this :
\b(@|#)[^\. \s, ]+
It looks a bit ugly but basically means find anything 'word' that starts with either @ or #, and select everything till the next space, dot or comma. You replace this with nothing and it's gone.