"Text Mining beginner HELP"

ayaghci
ayaghci New Altair Community Member
edited November 5 in Community Q&A
Hello,

I am quite new for text mining process. I am trying to user defined external dictionary but having problem.

My question is that when I create user-defined dictionary (in notepad), what will be the structure. For instance
I did craete my file as, and I used (open wordnet dictionary)
artier, arty
artiest, arty

but I am getting an error.

I appreciate any comments, or any reference (book, website) suggestion

 

Answers

  • Nils_Woehler
    Nils_Woehler New Altair Community Member
    Hi,

    with which operator do you want to use your external dictionary? Depending on the operator the structure of the dictionary may change.

    Best,
    Nils
  • ayaghci
    ayaghci New Altair Community Member
    Hi Nills

    I am trying to use [Stem (Dictionary)] operator. My intention is that (1) generate dictionary (txt form), (2) stem the tokens

    Thanks in advance

    Art
  • Nils_Woehler
    Nils_Woehler New Altair Community Member
    Hi,

    your dictionary has to look like this:

    arty:artier
    arty:artiest
    and your process may look like this:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.007">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.007" expanded="true" name="Process">
        <process expanded="true" height="235" width="547">
          <operator activated="true" class="text:read_document" compatibility="5.2.001" expanded="true" height="60" name="Read Document" width="90" x="132" y="152">
            <parameter key="file" value=""/>
          </operator>
          <operator activated="true" class="text:tokenize" compatibility="5.2.001" expanded="true" height="60" name="Tokenize" width="90" x="313" y="165"/>
          <operator activated="true" class="text:stem_dictionary" compatibility="5.2.001" expanded="true" height="60" name="Stem (Dictionary)" width="90" x="447" y="165">
            <parameter key="file" value=""/>
          </operator>
          <connect from_op="Read Document" from_port="output" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_op="Stem (Dictionary)" to_port="document"/>
          <connect from_op="Stem (Dictionary)" from_port="document" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Best,
    Nils