Wrong TFIDF Values

smarto
smarto New Altair Community Member
edited November 2024 in Community Q&A
Hey Rapid community. I discovered something with the TFIDF, that I don't understand. Wether I use "Generate TFIDF" or "Process Documents" with this option, it seems like the most frequent words are delivered without any value at all.

I analyzed 10 documents, a couple of different sets, a couple of different setups, but i discover the same problem over and over.

image
image
image

These are screenshots from RM and MySQL. What am I doing wrong?
Tagged:

Answers

  • MariusHelf
    MariusHelf New Altair Community Member
    it seems like the most frequent words are delivered without any value at all.
    Yes, that's the definition of TF-IDF: it applies a penalty on words which appear in only very few or almost all documents. Imagine a word which appears in all documents: it contains no information at all.

    For the exact definition of TF-IDF you could start with the wikipedia article: http://en.wikipedia.org/wiki/Tf%E2%80%93idf

    Best, Marius

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.