How best to analyse tweets? (Also help with rule association problem)

jdvstanton · June 2017

A colleague and I are currently carrying out clustering (K-Means and DBscan) as well as rule association on about 30000 tweets for a project, unfortunately after many attempts we still find incoherent data or results which despite our best efforts has resulted in few conclusions about the data.

Other than sentiment analysis which I would like to carry out if I have time but is rather difficult (so I have been told) what else could I do?

I am having some difficulty in particular with rule association, I managed to carry out rule association on the text but I would also like to include the time the tweet was sent. Unfortunately when I carry out the process the rules include the words "Time_sent" without the time actually stated in the rules. How can I fix this?

Thomas_Ott · June 2017

I do a lot of Twitter analysis with the Text Mining extension, clustering and use association rules quite a bit. A large row count shouldn't scare you away it's all the tokens that you generate that'll slow the process down. Do you do a lot of pruning when you process? I spend at a lot of time in data prep and I selectively tokenize hashtags, links, and twitter handles.

jdvstanton · June 2017

Hi Tom, we did spend a lot of time preparing the data, I am not sure how well we did however we managed to reduce the number of columns of word attributes from about 7000 to 900/1000 for every document we processed.

I managed to make some sense of the rules of association I used however unfortunately it seems as though there is not much to say regarding the data.

The hashtag is not a problem, the data given contained only one distinct hashtag so we just removed the attribute, they were all related already luckily. I use a percentual pruning method in the document process (below percent = 0.09/0.1, above percent = 100)

I feel though that I have made more progress than my colleague who cannot make sense of the cluster data, I have also tried helping him but the data is quite strange. I am not sure how to help him.

Should I conduct sentiment analysis? Or is it not necessary?

jdvstanton · June 2017

@Thomas_Ott wrote:
I do a lot of Twitter analysis with the Text Mining extension, clustering and use association rules quite a bit. A large row count shouldn't scare you away it's all the tokens that you generate that'll slow the process down. Do you do a lot of pruning when you process? I spend at a lot of time in data prep and I selectively tokenize hashtags, links, and twitter handles.

Hi Tom, we did spend a lot of time preparing the data, I am not sure how well we did however we managed to reduce the number of columns of word attributes from about 7000 to 900/1000 for every document we processed.

I managed to make some sense of the rules of association I used however unfortunately it seems as though there is not much to say regarding the data.

The hashtag is not a problem, the data given contained only one distinct hashtag so we just removed the attribute, they were all related already luckily. I use a percentual pruning method in the document process (below percent = 0.09/0.1, above percent = 100)

I feel though that I have made more progress than my colleague who cannot make sense of the cluster data, I have also tried helping him but the data is quite strange. I am not sure how to help him.

Should I conduct sentiment analysis? Or is it not necessary?

Thomas_Ott · June 2017

I guess the question is, what's the ultimate goal of this analysis? That will help form which direction to take.

How best to analyse tweets? (Also help with rule association problem)

Answers

Categories