Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
How to set up model to categorize texts
gstar
Hi folks, beeing a relative new bee to rapid miner, I would like to achieve the following task:
To set up a process that
1) does text mining* to find out the most common words within a category of text (e.g. recipes for beef, vegetables, etc.)
2) feeds the different results for each category into a model to teach the model the text category
3) takes an unknown text (e.g. a recipe for beef stock) and compares it to the model to find out the corresponding category.
*the documents are relatively short and contain between 50 and 200 words
So far I accomplished the text mining process quite well.
Choosing the right model seems challenging.
A decision tree model comes up with a plausible model. However, the the branches do not expose y/n (word exists / does not exist). Instead I am just presented statistics for decision making that I can not use for step 3. :-[
Thanks for any input!
Gstar
Find more posts tagged with
AI Studio
Accepted answers
All comments
MariusHelf
Hi Gstar,
for text mining Naive Bayes or a linear SVM usually do a good job.
Don't forget to optimize the C parameter of the SVM using Optimize Parameters (Grid). Usually a range between 1e-4 and 1 on a logarithmic scale is a good starting point. Expand the range if the detected optimum is near the limits of the range.
Best regards,
Marius
gstar
Great. Tanks! I'll try it and report back later!
gstar
Working with 5 categories, so far i got the best results with a k-nn model using overlap similarities and k=5.
Naive bayes performs worse.
I cannot get SVM (linear) to work, since it does not support polynominal labels (i.e. 5 different labels in my case).
Is there a workaround?
MariusHelf
The operator Polynominal by Binominal classification is your friend in this case
Best regards,
Marius
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups