[SOLVED] SVM and polynomial attributes from word vector

Question

Hi guys, sorry for coming up with a question as my first post, but I am currently working on my bachelor's thesis and I experience major trouble applying the SVM learner on my dataset. I googled a lot and found 2 related posts already submitted to this forum: http://rapid-i.com/rapidforum/index.php?topic=3845.0 http://rapid-i.com/rapidforum/index.php?topic=524.0 So I obviously did my job but still can't help myself on this one. I'll make it quick: I basically pull out two values from my DB:* 'text' - which is just a field containing some text * 'pol' - which is a polarity label with the possible values 'pro' and 'con' The text goes out of the DB, into a text preprocessor which tokenizes and filters the usual way. From there, it should be fed into the x-Validator and the SVM should try to learn polarity classification. The label is binominal, so that shouldn't cause any trouble. What I understand is that the text gets converted into a word vecor (in my case TF-IDF) and the word vector becomes the items attribute - of course a word vector is polynomial so SVM can't handle it. Correct me if I am wrong on any of this. But heres the point: How can I make SVM handle the word vector correctly? If I convert it with the nominal2numerical (which is what most ppl advise, but I actually doubt that step to be useful in my case), SVM brings crappy results (< default, so basically impossible). Has anyone ever had that before? Any help would be gratefully appreciated! Thanks in advance Vaas

MariusHelf · Answer

Thanks for the kudos! I am glad that I could help you :)

Vaas · Answer

I found the error. For the record:

The "Extract content"-Operator designed to strip input data from html-tags adds several attributes he tries to extract from so-called "meta-tags" (e.g. <meta name="author" content="John" />, found in almost all documents online), which are - of course - nominal.

And that's the point where my SVM got confused. Setting the breakpoint and looking at what attributes are added was the right hint in this case. Marius, I have to take a bow: Instructions on how to help oneself are almost always the best way to go! Thanks again for everything and have a nice day

vaas

Vaas · Answer

Hi Marius, thanks for helping me out! I aggree with you on the numerical TF-IDF values. Thats exactly why I was confused to see the operation stop before training the SVM with the messageThe operator SVM does not have sufficient capabilities for the given data set: polynomial attributes not supported [...]

So why does he refuse the dataset? Sould that be related to the column types I defined in my sql database? I doubt that because the word vector attribute is created by rapidminer and not fetched from the DB... I tried any SVM available with any parameter setup possible. It still refuses to learn (k-NN and naive bayes are working flawlessly btw, neural networks give the same error upon operation as SVM)

Thanks again for the help, I appreciate your efforts!!