Token Replace

ema
ema New Altair Community Member
edited November 2024 in Community Q&A
Hi
can anybody give me an example to a token replace attributes

for example

replace a word ends with s with the word

dances - dance 

what would i put in replace dictionary

Thank you
Tagged:

Answers

  • ema
    ema New Altair Community Member
    hi ...

    I tried token replace and it does the replace but do not remove the original word

    for example

    if dancing to be replaced by danc

    the output will have dancing and danc

    Thank you
  • IngoRM
    IngoRM New Altair Community Member
    Hi,

    did you use the operator TokenReplace before a tokenizer?

    Here is an example of the operator added to one of the example processes delivered with the Text plugin:

    <operator name="Root" class="Process" expanded="yes">
        <operator name="TextInput" class="TextInput" expanded="yes">
            <list key="texts">
              <parameter key="graphics" value="../data/newsgroup/graphics"/>
              <parameter key="hardware" value="../data/newsgroup/hardware"/>
            </list>
            <parameter key="default_content_encoding" value="ISO-8859-1"/>
            <parameter key="prune_below" value="2"/>
            <list key="namespaces">
            </list>
            <parameter key="create_text_visualizer" value="true"/>
            <parameter key="on_the_fly_pruning" value="3"/>
            <operator name="TokenReplace" class="TokenReplace">
                <list key="replace_dictionary">
                  <parameter key="cantaloupe" value="cantaHORST"/>
                </list>
            </operator>
            <operator name="StringTokenizer" class="StringTokenizer">
            </operator>
            <operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
            </operator>
            <operator name="TokenLengthFilter" class="TokenLengthFilter">
                <parameter key="min_chars" value="3"/>
            </operator>
            <operator name="ToLowerCaseConverter" class="ToLowerCaseConverter">
            </operator>
            <operator name="TermNGramGenerator" class="TermNGramGenerator">
            </operator>
        </operator>
    </operator>
    Cheers,
    Ingo
  • mskinner
    mskinner New Altair Community Member

    this does not seem to work

  • pschlunder
    pschlunder New Altair Community Member

    Here is an up-to-date version:

    <operator activated="true" class="process" compatibility="5.0.000" expanded="true" name="Root">
    <process expanded="true">
    <operator activated="true" class="text:create_document" compatibility="7.5.000" expanded="true" height="68" name="Create Document" width="90" x="45" y="34">
    <parameter key="text" value="Some text about different kind of dances people might enjoy."/>
    </operator>
    <operator activated="true" class="text:process_documents" compatibility="7.5.000" expanded="true" height="103" name="Process Documents" width="90" x="313" y="34">
    <process expanded="true">
    <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="179" y="34"/>
    <operator activated="true" class="text:replace_tokens" compatibility="7.5.000" expanded="true" height="68" name="Replace Tokens" width="90" x="380" y="34">
    <list key="replace_dictionary">
    <parameter key="([a-zA-Z]+)s" value="$1"/>
    </list>
    </operator>
    <connect from_port="document" to_op="Tokenize" to_port="document"/>
    <connect from_op="Tokenize" from_port="document" to_op="Replace Tokens" to_port="document"/>
    <connect from_op="Replace Tokens" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Create Document" from_port="output" to_op="Process Documents" to_port="documents 1"/>
    <connect from_op="Process Documents" from_port="example set" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>

    Remark: Make sure to download the Text Processing Extension from the Marketplace in order for this solution to work.

     

    Key element:

    To extract a tokens substring, that matches a certain criteria, use the group feature of regular expressions. Here we identify token ending with 's' by using the expression ([a-zA-Z]+)s and refering to the targeted substring by the group identifier $1.

     

    Hope it helps.