Sentiment Analysis with SVM

mtd
New Altair Community Member
Hello All, I am newbie in Rapid Miner. I am trying to classify twitter data set with Linear SVM. But I got the following errors. Anyone can help me,please.
"The input Exampleset does not match the training ExampleSet. Missing Attribute: "aaronecarroll".
The operator expects the input Exampleset to have a set of Attributes which is equal or a superset of the Exampleset used for training of the input model. Please make sure that the attributes of the two examples satisfy this condition."
"The input Exampleset does not match the training ExampleSet. Missing Attribute: "aaronecarroll".
The operator expects the input Exampleset to have a set of Attributes which is equal or a superset of the Exampleset used for training of the input model. Please make sure that the attributes of the two examples satisfy this condition."
Tagged:
0
Answers
-
Hi mtd,
are you sure you tokenized training and testing data set the same way? Have you used the word vector to assure the same words in the testing/apply phase
Best,
Martin0 -
Hello mtb,
I'm actually doing the same, also with twitter data
I've got the same problem when I wanted to use RapidMiner for a sentiment analysis. I guess that you use TF-IDF for preprocessing the data, right?
You need to connect the words of the testset (there's an output-point at the process-document-operator with label "words") to the operator, which preprocess the trainingset. That makes sure that the attributes that are used to train the classifier are the same attributes that are used to apply to testset.
I furthermore have a question myself about this topic. I also posted this question in another theme but maybe we can also discuss my problem here:
This solution that I mentioned works but my problem now is that I don't understand WHY I need to do that.
A classifier is in the end a mathematical function, containing of numbers and operators. After be trained, it doesn't need any attributes of the trainingset anymore, right? After training, the parameters, like C, are set, so it only needs to read the unknown X of the testset and compute the result, which is the label.
So why does it need the words of the testset?
Can someone may help me to understand that?0 -
Hi,
You are right for the model. But you need to tokenize your test set first. In order to do this you need to know which words need to be present in the test set so the model can be applied. If e.g. the word RapidMiner does not exsist in the test set, you still need to create the col. with 0.
Does this help?
~Martin0 -
Yes, that helps. Thank you very much0