Community & Support
Learn
Marketplace
Discussions
Categories
Discussions
General
Platform
Academic
Partner
Regional
User Groups
Documentation
Events
Altair Exchange
Share or Download Projects
Resources
News & Instructions
Programs
YouTube
Employee Resources
This tab can be seen by employees only. Please do not share these resources externally.
Groups
Join a User Group
Support
Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
Sentiment Analysis with SVM
mtd
Hello All, I am newbie in Rapid Miner. I am trying to classify twitter data set with Linear SVM. But I got the following errors. Anyone can help me,please.
"The input Exampleset does not match the training ExampleSet. Missing Attribute: "aaronecarroll".
The operator expects the input Exampleset to have a set of Attributes which is equal or a superset of the Exampleset used for training of the input model. Please make sure that the attributes of the two examples satisfy this condition."
Find more posts tagged with
AI Studio
Accepted answers
All comments
MartinLiebig
Hi mtd,
are you sure you tokenized training and testing data set the same way? Have you used the word vector to assure the same words in the testing/apply phase
Best,
Martin
Elisa0815
Hello mtb,
I'm actually doing the same, also with twitter data
I've got the same problem when I wanted to use RapidMiner for a sentiment analysis. I guess that you use TF-IDF for preprocessing the data, right?
You need to connect the words of the testset (there's an output-point at the process-document-operator with label "words") to the operator, which preprocess the trainingset. That makes sure that the attributes that are used to train the classifier are the same attributes that are used to apply to testset.
I furthermore have a question myself about this topic. I also posted this question in another theme but maybe we can also discuss my problem here:
This solution that I mentioned works but my problem now is that I don't understand WHY I need to do that.
A classifier is in the end a mathematical function, containing of numbers and operators. After be trained, it doesn't need any attributes of the trainingset anymore, right? After training, the parameters, like C, are set, so it only needs to read the unknown X of the testset and compute the result, which is the label.
So why does it need the words of the testset?
Can someone may help me to understand that?
MartinLiebig
Hi,
You are right for the model. But you need to tokenize your test set first. In order to do this you need to know which words need to be present in the test set so the model can be applied. If e.g. the word RapidMiner does not exsist in the test set, you still need to create the col. with 0.
Does this help?
~Martin
Elisa0815
Yes, that helps. Thank you very much
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups