Wrong TFIDF Values

smarto
smarto New Altair Community Member
edited November 5 in Community Q&A
Hey Rapid community. I discovered something with the TFIDF, that I don't understand. Wether I use "Generate TFIDF" or "Process Documents" with this option, it seems like the most frequent words are delivered without any value at all.

I analyzed 10 documents, a couple of different sets, a couple of different setups, but i discover the same problem over and over.

image
image
image

These are screenshots from RM and MySQL. What am I doing wrong?
Tagged:

Answers

  • MariusHelf
    MariusHelf New Altair Community Member
    it seems like the most frequent words are delivered without any value at all.
    Yes, that's the definition of TF-IDF: it applies a penalty on words which appear in only very few or almost all documents. Imagine a word which appears in all documents: it contains no information at all.

    For the exact definition of TF-IDF you could start with the wikipedia article: http://en.wikipedia.org/wiki/Tf%E2%80%93idf

    Best, Marius