Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
TF-IDF calculation
tmproxy
Hi.
It seems that RapidMiner's TextInput operator calculates TF-IDF for the whole document corpus it reads. In text classification, however, corpus-based keyword selection (based on TF-IDF) favors prevailing classes and penalizes classes with small number of training documents. Class-based keyword selection on the other hand gives equal weight to each class. So, my question is how could one calculate TF-IDF for each label separately, i.e. treating each label in the body of the TextInput operator as a separate corpus?
Thanks in advance for your help.
Find more posts tagged with
AI Studio
Accepted answers
All comments
land
Hi,
it seems to me that it will be difficult to apply a model in testing phase, when you can't know the actual label. Which classes frequency are you going to use if you don't have information about the class?
If you are just doing exploratory analysis, you might use the loop values operator to get each label value as a macro and filter the example set accordingly.
Greetings,
Sebastian
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups