HELP please-Regular expressions (Replace tokens)

happy_neid
happy_neid New Altair Community Member
edited November 2024 in Community Q&A

I want to find all tokens that are #hashtags and to replace them with the word "mention", but i want to leave certain subset of those hashtags,. 

Example: If i have words #apple #juice #tree #dog #table  i want to replace #apple and #juice with the word "mention"  and i want to leave tokens #tree #dog and #table as they are now. 
 
How to do that with operator replace tokens?

I would really appreciate any help...

Answers

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    To drop the "#" you could do something like do a selection like #(.*) and then a replace by $1.

     

    If you want to select #apple and replace it with "mention" you could do a selection like #apple and then replace with mention. This could get very messy if you have a lot of words you want to replace.

     

    What I would suggest to do is use the Replace Dictionary operator and pass a list of words you want to change to mention. everything needs to be in a nominal data format first and then you have to convert it to text to let the Process Documents from Data work. In essence you do the token replacement before you text process.

  • BalazsBaranyRM
    BalazsBaranyRM New Altair Community Member

    Hi,

     

    what you're trying to do is a so-called "negative lookahead", an advanced regular expression concept.

     

    Take a look at this process:

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_data_user_specification" compatibility="7.3.001" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="112" y="34">
    <list key="attribute_values">
    <parameter key="example1" value="&quot;words #apple #juice #tree #dog #table i want to replace&quot;"/>
    <parameter key="example2" value="&quot;other words like #apple, #ibm, #microsoft, #rapidminer, #dog, whatever&quot;"/>
    </list>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="replace" compatibility="7.3.001" expanded="true" height="82" name="Replace" width="90" x="246" y="34">
    <parameter key="replace_what" value="\#(?!(tree|dog|table))(\w+)"/>
    <parameter key="replace_by" value="mention"/>
    </operator>
    <connect from_op="Generate Data by User Specification" from_port="output" to_op="Replace" to_port="example set input"/>
    <connect from_op="Replace" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    It seems to do what you want.

    The hashtags you don't want to match are given in this expression: \#(?!(tree|dog|table))(\w+)

     

    Regards,

    Balázs

  • Mustafa_AVDAN
    Mustafa_AVDAN New Altair Community Member

    hey ı have the same problem and ı did it like you said but result is not what ı want.please look at my screen and help me:\

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    Are you looking to do something like this?

    2017-12-01_9-34-33.png

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.002">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.002" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter" width="90" x="45" y="34">
    <parameter key="connection" value="Twtter Test"/>
    <parameter key="query" value="Windows"/>
    <parameter key="locale" value="en"/>
    </operator>
    <operator activated="true" class="replace" compatibility="7.6.002" expanded="true" height="82" name="Replace" width="90" x="246" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Text"/>
    <parameter key="replace_what" value="#(\w+)"/>
    <parameter key="replace_by" value="$1"/>
    </operator>
    <connect from_op="Search Twitter" from_port="output" to_op="Replace" to_port="example set input"/>
    <connect from_op="Replace" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
  • Mustafa_AVDAN
    Mustafa_AVDAN New Altair Community Member

    oow thanks Sir;

    when ı changed $1 as "myword" , it worked succesfully.Thanks to Rapid Miner Family:D