Analyze 77000 tweets

https://rapidminer.com/contact-sales-request-demo/

Hi Vittorio,

regarding the license I would suggest to request a demo. It may be a temporary solution, but it's the best to get you started and see if the platform brings value to you. You may also be able to apply for an educational license.

https://rapidminer.com/educational-program/

Regarding the analysis, I also work with Twitter and I it's a very special case of text analysis. I think that clustering won't give you the results you want, because most of the words are just garbage and vary a lot from tweet to tweet. My suggestion would be to train a sentiment model using another dataset, and then apply it to the tweets. You have to somehow get your hands on labeled sentiment data in Italian.

Regards,

Sebastian

New Altair Community Member

Hi @vittorio_confuo,

As mentionned Aylien has limitation, and does not support Italian.

So I propose to use a Python script using the "textblob" library.

This script translate the tweet from italian to english and then extract the sentiment (negative, neutral, positive) :

-1 < sentiment < -0.1 ==> negative

-0.1 < sentiment< 0.1 ==> neutral

0.1< sentiment < 1 ==> positive

The process :

<?xml version="1.0" encoding="UTF-8"?><process version="8.2.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.2.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.2.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="34">
        <parameter key="generator_type" value="comma_separated_text"/>
        <list key="function_descriptions"/>
        <list key="numeric_series_configuration"/>
        <list key="date_series_configuration"/>
        <list key="date_series_configuration (interval)"/>
        <parameter key="input_csv_text" value="Id,Text&#10;1,iphone telefono peggiore apple ha fatto ciao messaggio&#10;"/>
      </operator>
      <operator activated="true" class="set_macros" compatibility="8.2.001" expanded="true" height="82" name="Set Macros" width="90" x="246" y="34">
        <list key="macros">
          <parameter key="textAttribute" value="'Text'"/>
        </list>
      </operator>
      <operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Execute Python" width="90" x="380" y="34">
        <parameter key="script" value="import pandas as pd&#10;from textblob import TextBlob&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;&#10;textAttr = %{textAttribute}&#10;&#10;def translate(text):&#10;&#10;  transl = TextBlob(str(text))&#10;  trans = transl.translate(to = 'en') &#10;  return trans&#10;&#10;def sent(text):&#10;&#10;  transl = TextBlob(str(text))&#10;  trans = transl.sentiment.polarity&#10;  return trans&#10;    &#10;&#10;def rm_main(data): &#10;&#10;  data['translate'] = data[textAttr].apply(translate)&#10;  data['sentiment'] = data['translate'].apply(sent)&#10;    &#10;    # connect 2 output ports to see the results&#10;  return data"/>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="8.2.001" expanded="true" height="82" name="Generate Attributes" width="90" x="514" y="34">
        <list key="function_descriptions">
          <parameter key="Sentiment" value="if(sentiment&lt;-0.1,&quot;negative&quot;,if(sentiment&lt;0.1,&quot;neutral&quot;,&quot;positive&quot;))"/>
        </list>
      </operator>
      <connect from_op="Create ExampleSet" from_port="output" to_op="Set Macros" to_port="through 1"/>
      <connect from_op="Set Macros" from_port="through 1" to_op="Execute Python" to_port="input 1"/>
      <connect from_op="Execute Python" from_port="output 1" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

To execute this proces, you have to :

- install python

- install textblob (pip install textblob)

- set your text attribute in the Set Macros parameters.

I hope it helps

Regards,

Lionel

Spelling_Correction_5.png

vittorio_confuo

New Altair Community Member

Hi @SGolbert thank you very much. It is a good idea but actually I haven't so much time for getting by hands a dataset. If I do it (in the next week) I will post the result maybe it can be useful to someone.

For the association rule I have thought to discretize the sent_time in order to find the most important topic for each time step (for example every hour).

For the size of the dataset do you know if the operator "Sample stritified" is a good one?

Thank you for your help,

Vittorio Confuorto

vittorio_confuo

New Altair Community Member

Hi @lionelderkrikor, sorry but I've just seen your answer.

I have some problem with textblob installation. Can you tell me how to do it?

Thank you for your time,

Vittorio Confuorto

Thomas_Ott

New Altair Community Member

@lionelderkrikor thank you for that awesome python script! You just gave me so many ideas for application here! I have to work with this textblob library more!

New Altair Community Member

Hi @vittorio_confuo,

"I have some problem with textblob installation"

In order I can help you, can you be more precise ?

Regards,

Lionel

vittorio_confuo

New Altair Community Member

Hi @lionelderkrikor ,

I solved this problem but I currently have another one.

The process return me the following error:

Do you know how I can solve it?

You are very kind

Regards,

Vittorio Confuorto

Pic.jpeg

New Altair Community Member

Hi @vittorio_confuo,

Can you share your dataset and your process, so that I can reproduce the bug.

Regards,

Lionel

New Altair Community Member

Hi @Thomas_Ott,

You're welcome,

Happy sentiment analysis !

Regards,

Lionel

student_compute

New Altair Community Member

Jul 13, 2018

Hello
Can the analysis of feelings be based on the aspect?
Thankful

Thomas_Ott

New Altair Community Member

Jul 13, 2018

Hi @student_compute, I think the Textblob library can't do aspect based sentiment analysis, maybe @lionelderkrikor can confirm?

I'm pretty sure the NLTK python library CAN do that, you'd just have to build it into RapidMiner.

https://stanfordnlp.github.io/CoreNLP/index.html

New Altair Community Member

Jul 13, 2018

Hi all,

yes, I confirm that the Python's library "textblob" can't do aspect based sentiment analysis.

Regards,

Lionel

student_compute

New Altair Community Member

Jul 14, 2018

Hello
Thanks for the help of dear friends
For a NLTK sample on the rapidminer on Twitter data?
Sorry
Thanks a lot

SGolbert

New Altair Community Member

Jul 16, 2018

Hi,

thanks for the hint of textblob!

I wanted also to mention the Stanford CoreNLP library:

It's available both as web service and as Java library. This is surely your best option for productive use, as long as it has the required functionalities. It is a great candidate for adding functionalities to the text processing extension!

Edit: I have seen that it does not support Italian and the .jar file is 500 mb. This limits its use as addon to the RapidMiner .jar, at least without modularization

Best regards,

Sebastian

student_compute

New Altair Community Member

Jul 18, 2018

Hello. thank you very much
I went to the site and downloaded the english file. I copied to the RapidMiner plugins. What should I do now? So I can use it to analyze aspect-based emotions?
Thank you
have a nice day

jozeftomas_2020

Banned

Jul 18, 2018

Hello.
Mr. @ lionelderkrikor
I run your code but it has an error.
what's wrong?
Thanks a lot

py sa.JPG

New Altair Community Member

Jul 18, 2018

Hi @jozeftomas_2020,

Here the process works fine :

Some questions :

- Have you sucessfully install TextBlob ?

- Do you execute Python 2.x or Python 3.x

- Have you modified the dataset in the parameters of the Create ExampleSet operator ?

Can you send me back your process in order I try to reproduce your bug ?

Regards,

Lionel

Spelling_Correction_6.png

jozeftomas_2020

Banned

Jul 19, 2018

Hello
My process is exactly your code
Yes I have python and textbolb installed.
But I do not know why it does not run.:smileymad:
Could you check this data? Sorry sorry
https://community.rapidminer.com/t5/RapidMiner-Studio-Forum/How-to-correct-the-wrong-words/td-p/51027/page/3
This is the same Twitter data
thank you very much
Have a great day