"Help! Weighting of Key Words per Label/Class"

Hanepah
Hanepah New Altair Community Member
edited November 5 in Community Q&A
Hello,

thanks in advance for your help.
I have a problem with my rapidminer results.

I have a example dataset of tweets which I classify manually in to three different classes: Buy, Sell, Neutral.
I use the Naive Bayes and the k-nn Algorithm to cross validate my data. But the accuracy of data is just 40% (for buy and sell) and 70% of neutral. Thus overall I get an accuracy of nearly 60%.

My process looks very similar to the process Neill McGuigan used in his Vancouver blog. So I used tokenizing, stopword, stemming...
My data is an excel file with two columns: First, the class (nominal, lable), second the tweet.

I have two questions:
Is it possible to assign some important words to the three classes, e.g. everytime if a tweet contains "buying" that it is allocated to the buy class? Or may I weight some words more than others in one document?

Is there a maximum number of stopwords in a stopwordlist? Always if I update my own stopwordlist (it becomes longer), the process doesn't use the new one.

Do you have any ideas how I can optimize my result?
Is there another algorithm which works better using tweets?

Thanks for your help!

Kind regards!
thestony

Answers

  • Hanepah
    Hanepah New Altair Community Member
    Does no one has any help?

    kind regards!