Is it possible to do Sentiment Analysis and rate each commet with scaled values

lum-x
lum-x New Altair Community Member
edited November 5 in Community Q&A
Hello to all RapidMiner Experts,

My name is Lum Zhaveli, I am currently at third year of my studies at The University of Sheffield International Faculty, CITY College, in Thessaloniki. My dissertation has to do with Sentiment Analysis.
I have to integrate Sentiment Analysis Engine into a system called Effectinet developed by a student at our college. Effectinet is a system that allows students to answer two types of questions anonymously that lecture asks them online during the lecture (most of the questions should be related to how students feel and how much they have problems understanding the lecture).
The first form of answering questions is the fixed form that will use sliders and selecting an answer that the lecturer has posted.
The second form is, the student will use natural language (NL) to express his opinion for that question.
By this we believe that we can help lecturer to adjust the phase that is giving lecture and the way that he is explaining things in order to increase the lecture quality.

My questions are the following (I’m sorry I should have done a bit more research before asking but I just saw your videos on YouTube and today is my Birthday):
               1. Can Rapid Miner rate opinions with scales (from 1 to 5) not just with polarity?
               2. I will be using a DB to store the results. Can I store the result for each NL answer in the DB and the result will correspond only to that particular answers( I don’t need to group the results because that will be calculated by the system after)?
               2a.Lets say that i have:
                               This lecture is Great. (you give positive rating or a certain scale)
                               This lecture is useless for advanced students like me. (you give negative rating or a certain scale)
               3. Can I use rapid miner to do analysis on real time (as soon as a new answer is received by the server to process it and rate it)?
               4. I have used GATE a bit but Rapid Miner seems better but i am still confused with all the tools and other things around me, because i don't have the necessary support from my mentors at Collage.

Thanks in advance. =)
Lum Zhaveli
Tagged:

Answers

  • MariusHelf
    MariusHelf New Altair Community Member
    Hi Lum,

    first of all: happy birthday! I hope you had a great day :)
    Let's have a look at your questions:
    1. Most learners will not output only a binary decision like positive/negative, but also confidences. You could try to map confidence intervals to different scales.
    2. I don't understand this question completely, so let me just say that RapidMiner has both a Write Database operator to append data to new or existing database tables, and an Update Database operator, which performs writes similiar to the UPDATE statement of SQL.
    3. Once you created good processes and models, it is possible to deploy them to a RapidAnalytics server, which allows to do classifications/ratings via a webservice. Please keep in mind that there are certain legal restrictions due to the AGPL license - in general, the code on the server that queries the webservice must also be released under the AGPL. You could also call RapidMiner via the command line, but that tends to be slow, and you also have to consider legal issues if you integrate our software into your own systems.
    4. What should I say, of course our products are always a good choice :)

    Best regards and happy analysing,

    Marius
  • lum-x
    lum-x New Altair Community Member
    Hi Marius,

    Thanks for the respond, it clarified a lot of things that I was not understanding well. I will describe in more detail (including some technical details) questions 2. I read some new things also I checked something about confidences but I still need to read more and to understand theory better and how to implement them.

    The database will be used to hold the feedback that will be send from student in Natural Language (NL) and it will hold also the value (rating) when the NL feedback is processed. The row that will hold NL feedback is limited to 300 characters and the row that will hold the result will be an integer.
    After the NL feedback is processed the result of analysis for that particular feedback will be stored at value. At the table below I have given an example that I hope it will make it cleaner. NL feedbackValue
    This lecture is Great.   5  
    This lecture is useless for advanced students like me.     1  
    This is Great, We are going to have fun. :D   5  
    I am too advanced for this.   2  
    I love the lecturer but the unit is too basic for me.   4  
    Even for me as the only girl in the class, this unit is basic and boring.   3  
    This unit rocks.   5  
    Each NL feedback will have its own result because we will derive data and create charts for the lecturer.
    If there is no way to have values with scales from 1-5 what is the suggestion to do in this situation.


    EDIT: Wanted to show what i have done so far to give a picture of how much i know at this point.
    The only thing i can make it work so far is retrieve data from the DB and count word occurrence.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.005">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.3.005" expanded="true" name="Process">
       <parameter key="parallelize_main_process" value="true"/>
       <process expanded="true">
         <operator activated="true" class="read_database" compatibility="5.3.005" expanded="true" height="60" name="Read Database" width="90" x="45" y="30">
           <parameter key="connection" value="DB"/>
           <parameter key="query" value="SELECT `opinion`&#10;FROM `result`"/>
           <enumeration key="parameters"/>
         </operator>
         <operator activated="true" class="nominal_to_text" compatibility="5.3.005" expanded="true" height="76" name="Nominal to Text" width="90" x="179" y="30"/>
         <operator activated="true" class="text:process_document_from_data" compatibility="5.3.000" expanded="true" height="76" name="Process Documents from Data" width="90" x="313" y="30">
           <parameter key="vector_creation" value="Binary Term Occurrences"/>
           <parameter key="add_meta_information" value="false"/>
           <parameter key="prune_method" value="percentual"/>
           <parameter key="prune_above_percent" value="95.0"/>
           <list key="specify_weights"/>
           <parameter key="parallelize_vector_creation" value="true"/>
           <process expanded="true">
             <operator activated="true" class="text:tokenize" compatibility="5.3.000" expanded="true" height="60" name="Tokenize" width="90" x="45" y="30"/>
             <operator activated="true" class="text:transform_cases" compatibility="5.3.000" expanded="true" height="60" name="Transform Cases" width="90" x="180" y="30"/>
             <operator activated="true" class="text:filter_stopwords_english" compatibility="5.3.000" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="315" y="30"/>
             <operator activated="true" class="text:stem_snowball" compatibility="5.3.000" expanded="true" height="60" name="Stem (Snowball)" width="90" x="450" y="30"/>
             <connect from_port="document" to_op="Tokenize" to_port="document"/>
             <connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
             <connect from_op="Transform Cases" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
             <connect from_op="Filter Stopwords (English)" from_port="document" to_op="Stem (Snowball)" to_port="document"/>
             <connect from_op="Stem (Snowball)" from_port="document" to_port="document 1"/>
             <portSpacing port="source_document" spacing="0"/>
             <portSpacing port="sink_document 1" spacing="0"/>
             <portSpacing port="sink_document 2" spacing="0"/>
           </process>
         </operator>
         <connect from_op="Read Database" from_port="output" to_op="Nominal to Text" to_port="example set input"/>
         <connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
         <connect from_op="Process Documents from Data" from_port="word list" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>
    As for sentiment analysis i have this non working solution.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.005">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.3.005" expanded="true" name="Process">
       <process expanded="true">
         <operator activated="true" class="read_database" compatibility="5.3.005" expanded="true" height="60" name="Read Database" width="90" x="45" y="75">
           <parameter key="connection" value="DB"/>
           <parameter key="query" value="SELECT `opinion`&#10;FROM `result`"/>
           <enumeration key="parameters"/>
         </operator>
         <operator activated="true" class="retrieve" compatibility="5.3.005" expanded="true" height="60" name="Retrieve" width="90" x="45" y="165">
           <parameter key="repository_entry" value="//Effectinet/effectinetResultsNL"/>
         </operator>
         <operator activated="true" class="x_validation" compatibility="5.3.005" expanded="true" height="112" name="Validation" width="90" x="313" y="75">
           <process expanded="true">
             <operator activated="true" class="k_nn" compatibility="5.3.005" expanded="true" height="76" name="k-NN" width="90" x="112" y="30"/>
             <connect from_port="training" to_op="k-NN" to_port="training set"/>
             <connect from_op="k-NN" from_port="model" to_port="model"/>
             <portSpacing port="source_training" spacing="0"/>
             <portSpacing port="sink_model" spacing="0"/>
             <portSpacing port="sink_through 1" spacing="0"/>
           </process>
           <process expanded="true">
             <operator activated="true" class="apply_model" compatibility="5.3.005" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
               <list key="application_parameters"/>
             </operator>
             <operator activated="true" class="performance" compatibility="5.3.005" expanded="true" height="76" name="Performance" width="90" x="180" y="30"/>
             <connect from_port="model" to_op="Apply Model" to_port="model"/>
             <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
             <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
             <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
             <portSpacing port="source_model" spacing="0"/>
             <portSpacing port="source_test set" spacing="0"/>
             <portSpacing port="source_through 1" spacing="0"/>
             <portSpacing port="sink_averagable 1" spacing="0"/>
             <portSpacing port="sink_averagable 2" spacing="0"/>
           </process>
         </operator>
         <connect from_op="Retrieve" from_port="output" to_op="Validation" to_port="training"/>
         <connect from_op="Validation" from_port="model" to_port="result 1"/>
         <connect from_op="Validation" from_port="training" to_port="result 2"/>
         <connect from_op="Validation" from_port="averagable 1" to_port="result 3"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
         <portSpacing port="sink_result 3" spacing="0"/>
         <portSpacing port="sink_result 4" spacing="0"/>
       </process>
     </operator>
    </process>
    Still looking at tutorials and reading other peoples blogs it not enough for now. Hope soon i will be able to work and understand better Rapid-i Products.

    Thanks for the help  ;D