"Text Mining beginner HELP"
ayaghci
New Altair Community Member
Hello,
I am quite new for text mining process. I am trying to user defined external dictionary but having problem.
My question is that when I create user-defined dictionary (in notepad), what will be the structure. For instance
I did craete my file as, and I used (open wordnet dictionary)
artier, arty
artiest, arty
but I am getting an error.
I appreciate any comments, or any reference (book, website) suggestion
I am quite new for text mining process. I am trying to user defined external dictionary but having problem.
My question is that when I create user-defined dictionary (in notepad), what will be the structure. For instance
I did craete my file as, and I used (open wordnet dictionary)
artier, arty
artiest, arty
but I am getting an error.
I appreciate any comments, or any reference (book, website) suggestion
Tagged:
0
Answers
-
Hi,
with which operator do you want to use your external dictionary? Depending on the operator the structure of the dictionary may change.
Best,
Nils0 -
Hi Nills
I am trying to use [Stem (Dictionary)] operator. My intention is that (1) generate dictionary (txt form), (2) stem the tokens
Thanks in advance
Art0 -
Hi,
your dictionary has to look like this:
and your process may look like this:
arty:artier
arty:artiest
Best,
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.007">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.007" expanded="true" name="Process">
<process expanded="true" height="235" width="547">
<operator activated="true" class="text:read_document" compatibility="5.2.001" expanded="true" height="60" name="Read Document" width="90" x="132" y="152">
<parameter key="file" value=""/>
</operator>
<operator activated="true" class="text:tokenize" compatibility="5.2.001" expanded="true" height="60" name="Tokenize" width="90" x="313" y="165"/>
<operator activated="true" class="text:stem_dictionary" compatibility="5.2.001" expanded="true" height="60" name="Stem (Dictionary)" width="90" x="447" y="165">
<parameter key="file" value=""/>
</operator>
<connect from_op="Read Document" from_port="output" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Stem (Dictionary)" to_port="document"/>
<connect from_op="Stem (Dictionary)" from_port="document" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Nils0