"[SOLVED] Empty Word List"

User: "beedaan"
New Altair Community Member
Updated by Jocelyn
Hi All,

I am counting the occurrences of words in a txt document.  The text document has abstracts of other documents, as well as the document title.  The general format of the file is such:

<document name>
<abstract>
<white space>
...

This continues for roughly 36,00 documents.  The total size of the document is 46MB.  I am expecting to get a word list of word occurrences as a result.  What I actually get is an empty word list.  Here is my attached process:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
   <process expanded="true" height="641" width="1024">
     <operator activated="true" class="text:read_document" compatibility="5.2.004" expanded="true" height="60" name="Read Document" width="90" x="179" y="75">
       <parameter key="file" value="C:\Users\Administrator\Desktop\DTIC_RDF\sample.xml"/>
     </operator>
     <operator activated="true" class="text:process_documents" compatibility="5.2.004" expanded="true" height="94" name="Process Documents" width="90" x="447" y="75">
       <parameter key="create_word_vector" value="false"/>
       <parameter key="add_meta_information" value="false"/>
       <parameter key="keep_text" value="true"/>
       <parameter key="prune_method" value="absolute"/>
       <parameter key="prune_below_absolute" value="2"/>
       <parameter key="prune_above_absolute" value="9999"/>
       <process expanded="true" height="645" width="1024">
         <operator activated="true" class="text:tokenize" compatibility="5.2.004" expanded="true" height="60" name="Tokenize" width="90" x="125" y="28"/>
         <operator activated="true" class="text:transform_cases" compatibility="5.2.004" expanded="true" height="60" name="Transform Cases" width="90" x="313" y="75"/>
         <connect from_port="document" to_op="Tokenize" to_port="document"/>
         <connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
         <connect from_op="Transform Cases" from_port="document" to_port="document 1"/>
         <portSpacing port="source_document" spacing="0"/>
         <portSpacing port="sink_document 1" spacing="0"/>
         <portSpacing port="sink_document 2" spacing="0"/>
       </process>
     </operator>
     <connect from_op="Read Document" from_port="output" to_op="Process Documents" to_port="documents 1"/>
     <connect from_op="Process Documents" from_port="word list" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>

I used this youtube video as a guide: https://www.youtube.com/watch?feature=endscreen&;NR=1&v=EjD2M4r4mBM

Please let me know what I am doing wrong.  Thanks.

Find more posts tagged with