Wrong TFIDF Values
smarto
New Altair Community Member
Hey Rapid community. I discovered something with the TFIDF, that I don't understand. Wether I use "Generate TFIDF" or "Process Documents" with this option, it seems like the most frequent words are delivered without any value at all.
I analyzed 10 documents, a couple of different sets, a couple of different setups, but i discover the same problem over and over.
These are screenshots from RM and MySQL. What am I doing wrong?
I analyzed 10 documents, a couple of different sets, a couple of different setups, but i discover the same problem over and over.
These are screenshots from RM and MySQL. What am I doing wrong?
Tagged:
0
Answers
-
Yes, that's the definition of TF-IDF: it applies a penalty on words which appear in only very few or almost all documents. Imagine a word which appears in all documents: it contains no information at all.it seems like the most frequent words are delivered without any value at all.
For the exact definition of TF-IDF you could start with the wikipedia article: http://en.wikipedia.org/wiki/Tf%E2%80%93idf
Best, Marius0