[SOLVED] Dictionary Stemmer doesn´t work
claudioluciodov
New Altair Community Member
Hi folks,
I am trying to implement an simple process to test the stemmer dictionary:
text file, teste.txt with the following content:
someday
other
day
I create the example file in dictionary stemmer, file stem_teste.txt with the following content:
weekday : .*day
When I run the process. It doesn´t work. I suppose that "someday" must be changed by the "weekday".
XML File:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.000" expanded="true" name="Process">
<parameter key="logverbosity" value="all"/>
<parameter key="logfile" value="C:\log.rm"/>
<process expanded="true" height="280" width="413">
<operator activated="true" class="text:read_document" compatibility="5.2.000" expanded="true" height="60" name="Read Document" width="90" x="45" y="30">
<parameter key="file" value="D:\teste.txt"/>
</operator>
<operator activated="true" class="text:tokenize" compatibility="5.2.000" expanded="true" height="60" name="Tokenize" width="90" x="179" y="165">
<parameter key="mode" value="linguistic tokens"/>
</operator>
<operator activated="true" class="text:stem_dictionary" compatibility="5.2.000" expanded="true" height="60" name="Stem (Dictionary)" width="90" x="313" y="165">
<parameter key="file" value="D:\stem_teste.txt"/>
</operator>
<connect from_op="Read Document" from_port="output" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Stem (Dictionary)" to_port="document"/>
<connect from_op="Stem (Dictionary)" from_port="document" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
I am trying to implement an simple process to test the stemmer dictionary:
text file, teste.txt with the following content:
someday
other
day
I create the example file in dictionary stemmer, file stem_teste.txt with the following content:
weekday : .*day
When I run the process. It doesn´t work. I suppose that "someday" must be changed by the "weekday".
XML File:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.000" expanded="true" name="Process">
<parameter key="logverbosity" value="all"/>
<parameter key="logfile" value="C:\log.rm"/>
<process expanded="true" height="280" width="413">
<operator activated="true" class="text:read_document" compatibility="5.2.000" expanded="true" height="60" name="Read Document" width="90" x="45" y="30">
<parameter key="file" value="D:\teste.txt"/>
</operator>
<operator activated="true" class="text:tokenize" compatibility="5.2.000" expanded="true" height="60" name="Tokenize" width="90" x="179" y="165">
<parameter key="mode" value="linguistic tokens"/>
</operator>
<operator activated="true" class="text:stem_dictionary" compatibility="5.2.000" expanded="true" height="60" name="Stem (Dictionary)" width="90" x="313" y="165">
<parameter key="file" value="D:\stem_teste.txt"/>
</operator>
<connect from_op="Read Document" from_port="output" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Stem (Dictionary)" to_port="document"/>
<connect from_op="Stem (Dictionary)" from_port="document" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0
Answers
-
Hi,
the operator help is a bit misleading. Your dictionary file must not contain any spaces around the colon. The following dictionary file should work:weekday:.*day
Best, Marius0 -
Thanks a lot !!!0