text classification problem with non mutually-exclusive classes

New Altair Community Member

Nov 9, 2014

Updated Nov 5, 2024 by Jocelyn

hi everyone!

i have got a bunch of documents and probabilities of its belonging to a specific class.
E.g.: text C1 C2 C3 C4 C5 C6 C7 bla bla... 10% 20% 60% 80% 0% 5% 30%

I want to train a model wich could predict these probabilities out of a given text.

As you can see the documents have non mutually exclusive classes only a probabilitiy of its belonging. One can also see these probabilities do not add up to 100!!!

To get in touch with rapidminer i have preprocessed the documents (tokenzie, filter... ) and give them (mutually exclusive) labels.
E.g.: text label bla.. C1 lorem.. C2 ipsum C7

Then i have weighted these documents the SVM weighter an take only those beyond a specific treshold (other featureselection methods, like forward or backward selection, did not find an ending after several hours)
Afterwards i have trained a SVM-Model and made 10-fold Crossvalidation.
Which performed pretty well, with an accuarcy of 93%...

But in the end, i still have no solution to my initial problem and no clue how to proceed:

should i try to get these probabilities out of the confidence vlaue from the svm some how? Is this possibile? And how?
or train 7 linear regression models to predict these probabilities. But how to find a proper featureselection by over 2000 terms?
or try it with a bayesian model which should give the probability of a class?

Thank you in advance for your hints and suggestions!

Find more posts tagged with

AI Studio

Classification

Text Mining + NLP

text classification problem with non mutually-exclusive classes

Find more posts tagged with

Quick Links