A program to recognize and reward our most engaged community members
Neil McGuigan wrote:see my blog vancouverdata.blogspot.comi have a five part video series on text mining, including how to do classification (sentiment analysis) in the 5th partgood luckneil
<?xml version="1.0" encoding="UTF-8" standalone="no"?><process version="5.1.003"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="5.1.003" expanded="true" name="Process"> <process expanded="true" height="386" width="413"> <operator activated="true" class="text:create_document" compatibility="5.1.001" expanded="true" height="60" name="Create Document" width="90" x="98" y="61"> <parameter key="text" value="Apples are green and red. Lemons are yellow. One lemon and two oranges. Three apples."/> <parameter key="add label" value="true"/> <parameter key="label_type" value="text"/> <parameter key="label_value" value="textlabel"/> </operator> <operator activated="true" class="text:documents_to_data" compatibility="5.1.001" expanded="true" height="76" name="Documents to Data" width="90" x="112" y="210"> <parameter key="text_attribute" value="textlabel"/> <parameter key="add_meta_information" value="false"/> </operator> <operator activated="true" class="text:process_document_from_data" compatibility="5.1.001" expanded="true" height="76" name="Process Documents from Data" width="90" x="313" y="165"> <parameter key="vector_creation" value="Term Frequency"/> <list key="specify_weights"/> <process expanded="true" height="505" width="774"> <operator activated="true" class="text:tokenize" compatibility="5.1.001" expanded="true" height="60" name="Tokenize" width="90" x="112" y="75"/> <operator activated="true" class="text:filter_stopwords_dictionary" compatibility="5.1.001" expanded="true" height="60" name="Filter Stopwords (Dictionary)" width="90" x="514" y="75"> <parameter key="file" value="M:\Data\rmstop.txt"/> </operator> <connect from_port="document" to_op="Tokenize" to_port="document"/> <connect from_op="Tokenize" from_port="document" to_op="Filter Stopwords (Dictionary)" to_port="document"/> <connect from_op="Filter Stopwords (Dictionary)" from_port="document" to_port="document 1"/> <portSpacing port="source_document" spacing="0"/> <portSpacing port="sink_document 1" spacing="0"/> <portSpacing port="sink_document 2" spacing="0"/> </process> </operator> <connect from_op="Create Document" from_port="output" to_op="Documents to Data" to_port="documents 1"/> <connect from_op="Documents to Data" from_port="example set" to_op="Process Documents from Data" to_port="example set"/> <connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/> <connect from_op="Process Documents from Data" from_port="word list" to_port="result 2"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <portSpacing port="sink_result 3" spacing="0"/> </process> </operator></process>
lina wrote:thank you so much B.i do appreciate yor help!Filter Stopwords by Dictionary allows you to create your own stoplist - it reads from a file that you create.i'm trying to create this file but it is not recognized by RapidMiner.what should the form of this file be like?i've tried something like : "word1|word2..." but it doesn't work!any idea about it?regarding the classifier and the example given,i'm going to check it out and i hope i manage to classify my own documents!