Token Replace

ema
New Altair Community Member
Hi
can anybody give me an example to a token replace attributes
for example
replace a word ends with s with the word
dances - dance
what would i put in replace dictionary
Thank you
can anybody give me an example to a token replace attributes
for example
replace a word ends with s with the word
dances - dance
what would i put in replace dictionary
Thank you
Tagged:
0
Answers
-
hi ...
I tried token replace and it does the replace but do not remove the original word
for example
if dancing to be replaced by danc
the output will have dancing and danc
Thank you0 -
Hi,
did you use the operator TokenReplace before a tokenizer?
Here is an example of the operator added to one of the example processes delivered with the Text plugin:
Cheers,
<operator name="Root" class="Process" expanded="yes">
<operator name="TextInput" class="TextInput" expanded="yes">
<list key="texts">
<parameter key="graphics" value="../data/newsgroup/graphics"/>
<parameter key="hardware" value="../data/newsgroup/hardware"/>
</list>
<parameter key="default_content_encoding" value="ISO-8859-1"/>
<parameter key="prune_below" value="2"/>
<list key="namespaces">
</list>
<parameter key="create_text_visualizer" value="true"/>
<parameter key="on_the_fly_pruning" value="3"/>
<operator name="TokenReplace" class="TokenReplace">
<list key="replace_dictionary">
<parameter key="cantaloupe" value="cantaHORST"/>
</list>
</operator>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter" class="TokenLengthFilter">
<parameter key="min_chars" value="3"/>
</operator>
<operator name="ToLowerCaseConverter" class="ToLowerCaseConverter">
</operator>
<operator name="TermNGramGenerator" class="TermNGramGenerator">
</operator>
</operator>
</operator>
Ingo0 -
this does not seem to work
0 -
Here is an up-to-date version:
<operator activated="true" class="process" compatibility="5.0.000" expanded="true" name="Root">
<process expanded="true">
<operator activated="true" class="text:create_document" compatibility="7.5.000" expanded="true" height="68" name="Create Document" width="90" x="45" y="34">
<parameter key="text" value="Some text about different kind of dances people might enjoy."/>
</operator>
<operator activated="true" class="text:process_documents" compatibility="7.5.000" expanded="true" height="103" name="Process Documents" width="90" x="313" y="34">
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="179" y="34"/>
<operator activated="true" class="text:replace_tokens" compatibility="7.5.000" expanded="true" height="68" name="Replace Tokens" width="90" x="380" y="34">
<list key="replace_dictionary">
<parameter key="([a-zA-Z]+)s" value="$1"/>
</list>
</operator>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Replace Tokens" to_port="document"/>
<connect from_op="Replace Tokens" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<connect from_op="Create Document" from_port="output" to_op="Process Documents" to_port="documents 1"/>
<connect from_op="Process Documents" from_port="example set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>Remark: Make sure to download the Text Processing Extension from the Marketplace in order for this solution to work.
Key element:
To extract a tokens substring, that matches a certain criteria, use the group feature of regular expressions. Here we identify token ending with 's' by using the expression ([a-zA-Z]+)s and refering to the targeted substring by the group identifier $1.
Hope it helps.
0