🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"Help! Weighting of Key Words per Label/Class"

User: "Hanepah"
New Altair Community Member
Updated by Jocelyn
Hello,

thanks in advance for your help.
I have a problem with my rapidminer results.

I have a example dataset of tweets which I classify manually in to three different classes: Buy, Sell, Neutral.
I use the Naive Bayes and the k-nn Algorithm to cross validate my data. But the accuracy of data is just 40% (for buy and sell) and 70% of neutral. Thus overall I get an accuracy of nearly 60%.

My process looks very similar to the process Neill McGuigan used in his Vancouver blog. So I used tokenizing, stopword, stemming...
My data is an excel file with two columns: First, the class (nominal, lable), second the tweet.

I have two questions:
Is it possible to assign some important words to the three classes, e.g. everytime if a tweet contains "buying" that it is allocated to the buy class? Or may I weight some words more than others in one document?

Is there a maximum number of stopwords in a stopwordlist? Always if I update my own stopwordlist (it becomes longer), the process doesn't use the new one.

Do you have any ideas how I can optimize my result?
Is there another algorithm which works better using tweets?

Thanks for your help!

Kind regards!
thestony

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "Hanepah"
    New Altair Community Member
    OP
    Does no one has any help?

    kind regards!