How to clean tweets from hashtags and @

baran
New Altair Community Member
Hi everybody
I tried for 3 days to clean tweets from hashtags and @ but I couldn' t. Is there anybody for help
I tried for 3 days to clean tweets from hashtags and @ but I couldn' t. Is there anybody for help
Tagged:
0
Answers
-
Hi,
Do you mean just getting rid of the symbols "@ and #" or do you also want to remove what is following after, e.g. "@ingomierswa" and "#datascience" should be completely removed?
Both is easily possible with the operator "Replace" and a simple regular expression. Below is a small sample process showing you how this is done.
Hope this helps,
Ingo
<?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_data_user_specification" compatibility="7.3.001" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="112" y="34">
<list key="attribute_values">
<parameter key="sample_tweet" value=""This is just a sample tweet from @ingomierswa on #datascience - end of tweet.""/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="multiply" compatibility="7.3.001" expanded="true" height="103" name="Multiply" width="90" x="246" y="34"/>
<operator activated="true" class="replace" compatibility="7.3.001" expanded="true" height="82" name="Only remove symbols" width="90" x="380" y="34">
<parameter key="replace_what" value="@|#"/>
</operator>
<operator activated="true" class="replace" compatibility="7.3.001" expanded="true" height="82" name="Complete entities removed" width="90" x="380" y="136">
<parameter key="replace_what" value="@[a-zA-Z]*|#[a-zA-Z]*"/>
</operator>
<connect from_op="Generate Data by User Specification" from_port="output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Only remove symbols" to_port="example set input"/>
<connect from_op="Multiply" from_port="output 2" to_op="Complete entities removed" to_port="example set input"/>
<connect from_op="Only remove symbols" from_port="example set output" to_port="result 1"/>
<connect from_op="Complete entities removed" from_port="example set output" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="84"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>0 -
Yes exactly Thank you I will try it tomorrow then edit this post.0