how sentiment analysis by python or R
Find more posts tagged with
Hi @student_compute,
In addition to the solution of @kayman, I propose a Python script using the "textblob" library.
From your text attribute, this script delivers a polarity between -1 and +1 where :
-1 (negative) < polarity < +1 (positive).
To execute this script, you have to set the name of your text attribute (with quotes) in the Set Macros operator :
The process :
<?xml version="1.0" encoding="UTF-8"?><process version="9.0.000-BETA4">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.0.000-BETA4" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="social_media:search_twitter" compatibility="9.0.000-BETA" expanded="true" height="68" name="Search Twitter" width="90" x="112" y="85">
<parameter key="connection" value="dkk"/>
<parameter key="query" value="iphone"/>
<parameter key="limit" value="20"/>
<parameter key="language" value="en"/>
</operator>
<operator activated="true" class="set_macros" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Set Macros" width="90" x="246" y="85">
<list key="macros">
<parameter key="textAttribute" value="'Text'"/>
</list>
</operator>
<operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Execute Python" width="90" x="380" y="85">
<parameter key="script" value="import pandas from textblob import TextBlob textAtt = %{textAttribute} # rm_main is a mandatory function, # the number of arguments has to be the number of input ports (can be none) def sent(text) : testimonial = TextBlob(str(text)) sentiment = testimonial.sentiment.polarity return sentiment def rm_main(data): data['polarity'] =data[textAtt].apply(sent) return data "/>
</operator>
<connect from_op="Search Twitter" from_port="output" to_op="Set Macros" to_port="through 1"/>
<connect from_op="Set Macros" from_port="through 1" to_op="Execute Python" to_port="input 1"/>
<connect from_op="Execute Python" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Regards,
Lionel
Hi @student_compute,
I think you can use Generate Attributes and Set Data operators and eventually if needed Reorder Attributes operator.
Regards,
Lionel
Hi @student_compute,
You can, indeed, create a new attribute "Pol" defined by (for example) :
- if -1 < polarity < -0,1, then Pol = "negative"
- if -0,1 <= polarity <= 0,1, then Pol = "neutral"
- if 0,1 < polarity < 1, then Pol = "positive"
Note : You can, of, course, choose and set other thresholds than -0,1 / 0,1.
Here the associated process :
<?xml version="1.0" encoding="UTF-8"?><process version="9.0.000-BETA4">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.0.000-BETA4" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="social_media:search_twitter" compatibility="9.0.000-BETA" expanded="true" height="68" name="Search Twitter" width="90" x="112" y="85">
<parameter key="connection" value="dkk"/>
<parameter key="query" value="iphone"/>
<parameter key="limit" value="20"/>
<parameter key="language" value="en"/>
</operator>
<operator activated="true" class="set_macros" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Set Macros" width="90" x="246" y="85">
<list key="macros">
<parameter key="textAttribute" value="'Text'"/>
</list>
</operator>
<operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Execute Python" width="90" x="380" y="85">
<parameter key="script" value="import pandas from textblob import TextBlob textAtt = %{textAttribute} # rm_main is a mandatory function, # the number of arguments has to be the number of input ports (can be none) def sent(text) : testimonial = TextBlob(str(text)) sentiment = testimonial.sentiment.polarity return sentiment def rm_main(data): data['polarity'] =data[textAtt].apply(sent) return data "/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Generate Attributes" width="90" x="514" y="85">
<list key="function_descriptions">
<parameter key="Pol" value="if(polarity<-0.1,"negative",if(polarity>0.1,"positive","neutral"))"/>
</list>
</operator>
<connect from_op="Search Twitter" from_port="output" to_op="Set Macros" to_port="through 1"/>
<connect from_op="Set Macros" from_port="through 1" to_op="Execute Python" to_port="input 1"/>
<connect from_op="Execute Python" from_port="output 1" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Regards,
Lionel
Hi @student_compute,
The link to download RapidMiner 9.0 Beta :
http://static.rapidminer.com/rnd/html/rapidminer-9.0-preview.html
Regards,
Lionel
Hi @student_compute,
Yes, there is Perplexity as one of performance measure in the last version of LDA.
Regards,
Lionel
Hi @student_compute,
"How to install nltk package and use it?"
Lauch the windows "invite de commande" (type "cmd" in the search bar of Windows 10) and type de following command : pip install nltk
"But I do not know how to find Perplexity mesure for assessing LDA"
Connect the per output port of LDA operator to the res port
Regards,
Lionel
Hi @student_compute,
Can you share your process in order we can reproduce your bug ?
Try to add in the Python script after the others nltk.download('xxxxxx') :
nltk.download('vader_lexicon')
and execute the process one time.
Regards,
Lionel
I like to use the Vader sentiment part of the NLTK toolkit. It works pretty well with social data (sentiment analysis will always remain a bit of a challenge) and gives a bit more than the usual possitive / negative indications
Attached sample uses this framework, the example chops the response by sentence and gives the 'vibe' per sentence. I typically use this method to ensure also mixed data get's covered well. But of course you could also use it on the full data.
What I provided was like this ;
What it returns is as follows :
The more negative or possitive the compound value (range -1 to +1), the more likely it will be that the sentiment of a given sentence is equally negative or possitive