"opinion mining/sentiment analysis-rapidminer5"

lina
lina New Altair Community Member
edited November 5 in Community Q&A
hi!
I would appreciate your giving me any piece of information!it is really important to me!
i have created an excel file,filled with comments about a specific topic!
now,i am trying to classify these comments(in fact the comments are short sentences from various sources via the net)
into positive,neutral and negative!
how can i proceed?
please let me inform you that all comments are written in greek.i hope there is no problem with it!
since i am new to this topic i would be really grateful for any help!
thanks in advance, i am looking forward to your reply!

Answers

  • el_chief
    el_chief New Altair Community Member
    see my blog vancouverdata.blogspot.com

    i have a five part video series on text mining, including how to do classification (sentiment analysis) in the 5th part

    good luck

    neil
  • lina
    lina New Altair Community Member
    Neil McGuigan wrote:

    see my blog vancouverdata.blogspot.com

    i have a five part video series on text mining, including how to do classification (sentiment analysis) in the 5th part

    good luck

    neil
    thank you very much,neil!i'm going to visit your blog and watch the videos!! :)
  • lina
    lina New Altair Community Member
    hi!
    i'm still working on opinion mining but i have few problems.
    i have watched the videos from vancouverdata.btw,i found them really helpfull,thanks neil :)!
    First of all,the language i use is greek so i want to create a text for the operator: Filter Stopword.Does anyone know how the text should be like? I've created a text like this: "word1|word2...."but unfortunately it is not recognized. Any idea, please?
    Also, there is not a stem operator for my language.How can i create one as it seems to be very important?
    Apart from these problems, i have followed the method which is showed in the 5th part of the video series but i also have a problem. The operator naive bayes : "cannot check whether input example set has special attribute "label""
    What about this?Should i specify a label or an attribute in the file i use?Specifically, i use an excel file instead of database which is used in the video.
    Sorry for the long post.
    I'm looking forward to your answers and your help!!
  • B_
    B_ New Altair Community Member
    Lina

    Filter Stopwords by Dictionary allows you to  create your own stoplist - it reads from a file that you create.

    You can try using regular expressions to create a basic stemmer if the endings of Greek words are consistent for cases and gender.

    Here is a simple classifer you can adapt:
    http://rapid-i.com/rapidforum/index.php/topic,2993.0.html

    Remove the N-Gram operator and change input to Excel.  The column that contains the opinion should be set as Label in the Set Role operator.

    B.
  • lina
    lina New Altair Community Member
    thank you so much B.i do appreciate yor help!

    Filter Stopwords by Dictionary allows you to  create your own stoplist - it reads from a file that you create.

    i'm trying to create this file but it is not recognized by RapidMiner.what should the form of this file be like?
    i've tried something like : "word1|word2..." but it doesn't work!any idea about it?
    regarding the classifier and the example given,i'm going to check it out and i hope i manage to classify my own documents!
  • B_
    B_ New Altair Community Member
    In Windows it's a txt file.

    In rmstop.txt
    one
    two
    three


    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.003" expanded="true" name="Process">
        <process expanded="true" height="386" width="413">
          <operator activated="true" class="text:create_document" compatibility="5.1.001" expanded="true" height="60" name="Create Document" width="90" x="98" y="61">
            <parameter key="text" value="Apples are green and red.&#10;Lemons are yellow.&#10;One lemon and two oranges.&#10;Three apples."/>
            <parameter key="add label" value="true"/>
            <parameter key="label_type" value="text"/>
            <parameter key="label_value" value="textlabel"/>
          </operator>
          <operator activated="true" class="text:documents_to_data" compatibility="5.1.001" expanded="true" height="76" name="Documents to Data" width="90" x="112" y="210">
            <parameter key="text_attribute" value="textlabel"/>
            <parameter key="add_meta_information" value="false"/>
          </operator>
          <operator activated="true" class="text:process_document_from_data" compatibility="5.1.001" expanded="true" height="76" name="Process Documents from Data" width="90" x="313" y="165">
            <parameter key="vector_creation" value="Term Frequency"/>
            <list key="specify_weights"/>
            <process expanded="true" height="505" width="774">
              <operator activated="true" class="text:tokenize" compatibility="5.1.001" expanded="true" height="60" name="Tokenize" width="90" x="112" y="75"/>
              <operator activated="true" class="text:filter_stopwords_dictionary" compatibility="5.1.001" expanded="true" height="60" name="Filter Stopwords (Dictionary)" width="90" x="514" y="75">
                <parameter key="file" value="M:\Data\rmstop.txt"/>
              </operator>
              <connect from_port="document" to_op="Tokenize" to_port="document"/>
              <connect from_op="Tokenize" from_port="document" to_op="Filter Stopwords (Dictionary)" to_port="document"/>
              <connect from_op="Filter Stopwords (Dictionary)" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Create Document" from_port="output" to_op="Documents to Data" to_port="documents 1"/>
          <connect from_op="Documents to Data" from_port="example set" to_op="Process Documents from Data" to_port="example set"/>
          <connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/>
          <connect from_op="Process Documents from Data" from_port="word list" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • lina wrote:

    thank you so much B.i do appreciate yor help!

    Filter Stopwords by Dictionary allows you to  create your own stoplist - it reads from a file that you create.

    i'm trying to create this file but it is not recognized by RapidMiner.what should the form of this file be like?
    i've tried something like : "word1|word2..." but it doesn't work!any idea about it?
    regarding the classifier and the example given,i'm going to check it out and i hope i manage to classify my own documents!
    create an ascii file with txt  or  csv extension
    sample of the file data structure:

    attrib1,attrib2,attrib3
    apple,monkey,brick
    orange,monkey,stick