text classification problem with non mutually-exclusive classes

User: "mete"
New Altair Community Member
Updated by Jocelyn
hi everyone!

i have got a bunch of documents and probabilities of its belonging to a specific class.
E.g.:       text    C1    C2    C3    C4    C5    C6    C7        bla bla...    10%    20%    60%    80%    0%    5%    30% 
I want to train a model wich could predict these probabilities out of a given text.

As you can see the documents have non mutually exclusive classes only a probabilitiy of its belonging. One can also see these probabilities do not add up to 100!!!


To get in touch with rapidminer  i have preprocessed the documents (tokenzie, filter... ) and give them (mutually exclusive) labels.
E.g.:       text    label        bla..    C1        lorem..    C2        ipsum    C7 
Then i have weighted these documents the SVM weighter an take only those beyond a specific treshold (other featureselection methods, like forward or backward selection, did not find an ending after several hours)
Afterwards i have trained a SVM-Model and made 10-fold Crossvalidation.
Which performed pretty well, with an accuarcy of 93%...


But in the end, i still have no solution to my initial problem and no clue how to proceed:
  • should i try to get these probabilities out of the confidence vlaue from the svm some how? Is this possibile? And how?
  • or train 7 linear regression models to predict these probabilities. But how to find a proper featureselection by over 2000 terms?
  • or try it with a bayesian model which should give the probability of a class?

Thank you in advance for your hints and suggestions!

Find more posts tagged with