Altair RISE

A program to recognize and reward our most engaged community members

Nominate Yourself Now!

Token Replace

Hi
can anybody give me an example to a token replace attributes

for example

replace a word ends with s with the word

dances - dance

what would i put in replace dictionary

Thank you

Find more posts tagged with

AI Studio

Accepted answers

All comments

ema

hi ...

I tried token replace and it does the replace but do not remove the original word

for example

if dancing to be replaced by danc

the output will have dancing and danc

Thank you

IngoRM

Hi,

did you use the operator TokenReplace before a tokenizer?

Here is an example of the operator added to one of the example processes delivered with the Text plugin:


<operator name="Root" class="Process" expanded="yes">
    <operator name="TextInput" class="TextInput" expanded="yes">
        <list key="texts">
          <parameter key="graphics"	value="../data/newsgroup/graphics"/>
          <parameter key="hardware"	value="../data/newsgroup/hardware"/>
        </list>
        <parameter key="default_content_encoding"	value="ISO-8859-1"/>
        <parameter key="prune_below"	value="2"/>
        <list key="namespaces">
        </list>
        <parameter key="create_text_visualizer"	value="true"/>
        <parameter key="on_the_fly_pruning"	value="3"/>
        <operator name="TokenReplace" class="TokenReplace">
            <list key="replace_dictionary">
              <parameter key="cantaloupe"	value="cantaHORST"/>
            </list>
        </operator>
        <operator name="StringTokenizer" class="StringTokenizer">
        </operator>
        <operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
        </operator>
        <operator name="TokenLengthFilter" class="TokenLengthFilter">
            <parameter key="min_chars"	value="3"/>
        </operator>
        <operator name="ToLowerCaseConverter" class="ToLowerCaseConverter">
        </operator>
        <operator name="TermNGramGenerator" class="TermNGramGenerator">
        </operator>
    </operator>
</operator>

Cheers,
Ingo

mskinner

this does not seem to work

pschlunder

Here is an up-to-date version:

<operator activated="true" class="process" compatibility="5.0.000" expanded="true" name="Root">
    <process expanded="true">
      <operator activated="true" class="text:create_document" compatibility="7.5.000" expanded="true" height="68" name="Create Document" width="90" x="45" y="34">
        <parameter key="text" value="Some text about different kind of dances people might enjoy."/>
      </operator>
      <operator activated="true" class="text:process_documents" compatibility="7.5.000" expanded="true" height="103" name="Process Documents" width="90" x="313" y="34">
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="179" y="34"/>
          <operator activated="true" class="text:replace_tokens" compatibility="7.5.000" expanded="true" height="68" name="Replace Tokens" width="90" x="380" y="34">
            <list key="replace_dictionary">
              <parameter key="([a-zA-Z]+)s" value="$1"/>
            </list>
          </operator>
          <connect from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_op="Replace Tokens" to_port="document"/>
          <connect from_op="Replace Tokens" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Create Document" from_port="output" to_op="Process Documents" to_port="documents 1"/>
      <connect from_op="Process Documents" from_port="example set" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>

Remark: Make sure to download the Text Processing Extension from the Marketplace in order for this solution to work.

Key element:

To extract a tokens substring, that matches a certain criteria, use the group feature of regular expressions. Here we identify token ending with 's' by using the expression ([a-zA-Z]+)s and refering to the targeted substring by the group identifier $1.

Hope it helps.