"[solved] struggling with word list feature"
sana
New Altair Community Member
Hi,
Can anyone help me please. am struggling with the text processing section. am able to tokenize but never get any results as far as creating a word frequency list is concerned. it can't be that difficult as there are lots of preliminary software in the web for calculating word frequency lists.
the only thing that has worked so far for me is the process documents from files command, that too, showcases results of only one of the two directories i chose.
just now i did the process documents command again with a mix of things - tokenize, transform cases and filtering english stopwords - but no output - here is the process flow - i don't have any programming background so just following the way other posts have been filed.
hope someone can help me out here.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.014">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.014" expanded="true" name="Process">
<parameter key="logfile" value="C:\Users\user3\Desktop\dir text\5.txt"/>
<parameter key="resultfile" value="C:\Users\user3\Desktop\dir text\New Text Document.txt"/>
<process expanded="true" height="100" width="145">
<operator activated="true" class="text:process_documents" compatibility="5.1.004" expanded="true" height="76" name="Process Documents" width="90" x="45" y="30">
<process expanded="true" height="414" width="762">
<operator activated="true" class="text:tokenize" compatibility="5.1.004" expanded="true" height="60" name="Tokenize" width="90" x="31" y="27"/>
<operator activated="true" class="text:transform_cases" compatibility="5.1.004" expanded="true" height="60" name="Transform Cases" width="90" x="181" y="26"/>
<operator activated="true" class="text:filter_stopwords_english" compatibility="5.1.004" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="345" y="25"/>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
<connect from_op="Transform Cases" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
<connect from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<connect from_op="Process Documents" from_port="example set" to_port="result 1"/>
<connect from_op="Process Documents" from_port="word list" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Can anyone help me please. am struggling with the text processing section. am able to tokenize but never get any results as far as creating a word frequency list is concerned. it can't be that difficult as there are lots of preliminary software in the web for calculating word frequency lists.
the only thing that has worked so far for me is the process documents from files command, that too, showcases results of only one of the two directories i chose.
just now i did the process documents command again with a mix of things - tokenize, transform cases and filtering english stopwords - but no output - here is the process flow - i don't have any programming background so just following the way other posts have been filed.
hope someone can help me out here.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.014">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.014" expanded="true" name="Process">
<parameter key="logfile" value="C:\Users\user3\Desktop\dir text\5.txt"/>
<parameter key="resultfile" value="C:\Users\user3\Desktop\dir text\New Text Document.txt"/>
<process expanded="true" height="100" width="145">
<operator activated="true" class="text:process_documents" compatibility="5.1.004" expanded="true" height="76" name="Process Documents" width="90" x="45" y="30">
<process expanded="true" height="414" width="762">
<operator activated="true" class="text:tokenize" compatibility="5.1.004" expanded="true" height="60" name="Tokenize" width="90" x="31" y="27"/>
<operator activated="true" class="text:transform_cases" compatibility="5.1.004" expanded="true" height="60" name="Transform Cases" width="90" x="181" y="26"/>
<operator activated="true" class="text:filter_stopwords_english" compatibility="5.1.004" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="345" y="25"/>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
<connect from_op="Transform Cases" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
<connect from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<connect from_op="Process Documents" from_port="example set" to_port="result 1"/>
<connect from_op="Process Documents" from_port="word list" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0
Answers
-
Hi sana,
is the xml code you have posted the whole process? In that case you aren't getting any results because you aren't providing any documents to "Process Documents" operator.
You can use the "Read Document" operator to load documents. Connect it with the "Process Documents" operator and your results shouldn't be empty.
Greetings
Nils
0 -
Hi Nils,
Thanks a lot,
Guess I have to play around a lot right now
Nice to have people to bank upon, and great work happening here,
Please do keep it up,
Sana0