"[SOLVED] Empty Word List"
Hi All,
I am counting the occurrences of words in a txt document. The text document has abstracts of other documents, as well as the document title. The general format of the file is such:
<document name>
<abstract>
<white space>
...
This continues for roughly 36,00 documents. The total size of the document is 46MB. I am expecting to get a word list of word occurrences as a result. What I actually get is an empty word list. Here is my attached process:
Please let me know what I am doing wrong. Thanks.
I am counting the occurrences of words in a txt document. The text document has abstracts of other documents, as well as the document title. The general format of the file is such:
<document name>
<abstract>
<white space>
...
This continues for roughly 36,00 documents. The total size of the document is 46MB. I am expecting to get a word list of word occurrences as a result. What I actually get is an empty word list. Here is my attached process:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>I used this youtube video as a guide: https://www.youtube.com/watch?feature=endscreen&;NR=1&v=EjD2M4r4mBM
<process version="5.2.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<process expanded="true" height="641" width="1024">
<operator activated="true" class="text:read_document" compatibility="5.2.004" expanded="true" height="60" name="Read Document" width="90" x="179" y="75">
<parameter key="file" value="C:\Users\Administrator\Desktop\DTIC_RDF\sample.xml"/>
</operator>
<operator activated="true" class="text:process_documents" compatibility="5.2.004" expanded="true" height="94" name="Process Documents" width="90" x="447" y="75">
<parameter key="create_word_vector" value="false"/>
<parameter key="add_meta_information" value="false"/>
<parameter key="keep_text" value="true"/>
<parameter key="prune_method" value="absolute"/>
<parameter key="prune_below_absolute" value="2"/>
<parameter key="prune_above_absolute" value="9999"/>
<process expanded="true" height="645" width="1024">
<operator activated="true" class="text:tokenize" compatibility="5.2.004" expanded="true" height="60" name="Tokenize" width="90" x="125" y="28"/>
<operator activated="true" class="text:transform_cases" compatibility="5.2.004" expanded="true" height="60" name="Transform Cases" width="90" x="313" y="75"/>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
<connect from_op="Transform Cases" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<connect from_op="Read Document" from_port="output" to_op="Process Documents" to_port="documents 1"/>
<connect from_op="Process Documents" from_port="word list" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Please let me know what I am doing wrong. Thanks.