Open WordNet Dictionary and Extract Sentiment
pix123
New Altair Community Member
Hi there,
I am using the open wordnet dictionary operator along with the extract sentiment (english) operator. I have set up the dictionary path to point to the correct folder. I have a couple 100 text files that I want to analyze, if I analyze just 8 of those files the process runs fine, however if I try run it against 9+ of the text files I get an I/O error that the resource can't be read and parsed.
Is it possible to have the wordnet operator run once and remember the list of words instead of running for each time a new file is captured through the process documents from files operator?
If this is not possible, is there a way to overcome this issue?
Many Thanks.
0
Answers
-
Anyone able to assist with this query?0
-
Hello pic123,Is it perhaps an idea to post an XML file?Maerkli1
-
@Maerkli please see the attached XML process. Any help would be much appreciated. Thanks.<?xml version="1.0" encoding="UTF-8"?><process version="9.0.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.0.003" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="text:process_document_from_file" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Files" width="90" x="246" y="34">
<list key="text_directories">
<parameter key="fx_bukley_reviews" value="C:Tripadvisor Text Files 1"/>
</list>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="85"/>
<operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases" width="90" x="246" y="85"/>
<operator activated="true" class="text:filter_by_length" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (by Length)" width="90" x="380" y="85">
<parameter key="min_chars" value="3"/>
</operator>
<operator activated="true" class="text:stem_porter" compatibility="8.1.000" expanded="true" height="68" name="Stem (Porter)" width="90" x="581" y="85"/>
<operator activated="true" class="wordnet:open_wordnet_dictionary" compatibility="5.3.000" expanded="true" height="68" name="Open WordNet Dictionary" width="90" x="581" y="187">
<parameter key="directory" value="C:\Desktop\WordNet-3.0\dict"/>
</operator>
<operator activated="true" class="wordnet:find_sentiment_wordnet" compatibility="5.3.000" expanded="true" height="82" name="Extract Sentiment (English)" width="90" x="782" y="85"/>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
<connect from_op="Transform Cases" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
<connect from_op="Filter Tokens (by Length)" from_port="document" to_op="Stem (Porter)" to_port="document"/>
<connect from_op="Stem (Porter)" from_port="document" to_op="Extract Sentiment (English)" to_port="document"/>
<connect from_op="Open WordNet Dictionary" from_port="dictionary" to_op="Extract Sentiment (English)" to_port="dictionary"/>
<connect from_op="Extract Sentiment (English)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="generate_attributes" compatibility="9.0.003" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="34">
<list key="function_descriptions">
<parameter key="Recommend" value="if(sentiment<0,"NO","YES")"/>
</list>
</operator>
<connect from_port="input 1" to_op="Process Documents from Files" to_port="word list"/>
<connect from_op="Process Documents from Files" from_port="example set" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
0 -
Thanks for having shared your XML file. By executing the process, I can't reproduce what you see because I don't have the input data file.@Lionelderkrikor , may I ask you to have a look, please?Maerkli
0 -
Hi @pix123,
Is it always on the same txt file that the error occurs ? can you perform some tests ?
In order we can reproduce what you observe, can you share :
- your .txt files (a minima 9 .txt files)
- your dictionnary
Regards,
Lionel0 -
Hi again @pix123,
I executed your process on 12 of my own .txt files (with Wordnet 3.0 dictionnary) and I have no problem : Your process
works fine...
So my hypothesis is that one of your .txt files poses problem...
Regards,
Lionel0 -
Hi again @pix123,
Maybe an answer element :
Try to set file pattern = *.txt (instead file pattern = *) in the Process Documents from Files parameters.
Hope it helps,
Regards,
Lionel-1 -
@Maerkli @lionelderkrikor thank you for the suggestions so far, I have tried a random sample of 10 files but still get an error after the 8th file has processed. I also tried your suggestion of changing the file pattern to *txt but continue to get the I/O error.Attached is the dictionary and some sample *txt files. Appreciate any help in resolving.0
-
Hi @pix123,
I'm not able to reproduce the I/O error : Your process works fine with the 16 .txt files you shared on my computer.
Can you detail the I/O error you encounred ? Can you share the RapidMiner log file ?
Regards,
Lionel0 -
Duplicate
0 -
-
@lionelderkrikor thank you for your assistance thus far, attached are both the log file and a screenshot of the error I encounter.
0 -
0
-
Duplicate
0 -
Duplicate
0 -
Duplicate
0 -
Duplicate
0 -
Duplicate
0 -
@lionelderkrikor I am running on Windows 10
0 -
Hi @pix123,
I'm running Windows 10 too and with your files and your Wordnet dictionary, the process works fine here.
I do not know what to think...
Try to update RapidMiner to the latest release (RM 9.1)...and if needed the extension Wordnet
Anyone have an idea?
Regards,
Lionel0 -
@lionelderkrikor thank you, I tried another computer also running Windows 10 and got the same error.
If anyone else has suggestions they are appreciated. Thanks0 -
Hallo pix123,Lionel has made an amazing job, as usual. I can't add anything. Did you use Breakpoints - it can help to check the execution flow?Maerkli2