"Text Mining beginner HELP"

New Altair Community Member
I am quite new for text mining process. I am trying to user defined external dictionary but having problem.
My question is that when I create user-defined dictionary (in notepad), what will be the structure. For instance
I did craete my file as, and I used (open wordnet dictionary)
artier, arty
artiest, arty
but I am getting an error.
I appreciate any comments, or any reference (book, website) suggestion
I am quite new for text mining process. I am trying to user defined external dictionary but having problem.
My question is that when I create user-defined dictionary (in notepad), what will be the structure. For instance
I did craete my file as, and I used (open wordnet dictionary)
artier, arty
artiest, arty
but I am getting an error.
I appreciate any comments, or any reference (book, website) suggestion
with which operator do you want to use your external dictionary? Depending on the operator the structure of the dictionary may change.
Nils0 -
Hi Nills
I am trying to use [Stem (Dictionary)] operator. My intention is that (1) generate dictionary (txt form), (2) stem the tokens
Thanks in advance
Art0 -
your dictionary has to look like this:
and your process may look like this:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.007">
<operator activated="true" class="process" compatibility="5.2.007" expanded="true" name="Process">
<process expanded="true" height="235" width="547">
<operator activated="true" class="text:read_document" compatibility="5.2.001" expanded="true" height="60" name="Read Document" width="90" x="132" y="152">
<parameter key="file" value=""/>
<operator activated="true" class="text:tokenize" compatibility="5.2.001" expanded="true" height="60" name="Tokenize" width="90" x="313" y="165"/>
<operator activated="true" class="text:stem_dictionary" compatibility="5.2.001" expanded="true" height="60" name="Stem (Dictionary)" width="90" x="447" y="165">
<parameter key="file" value=""/>
<connect from_op="Read Document" from_port="output" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Stem (Dictionary)" to_port="document"/>
<connect from_op="Stem (Dictionary)" from_port="document" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>