🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Wrong TFIDF Values

User: "smarto"
New Altair Community Member
Updated by Jocelyn
Hey Rapid community. I discovered something with the TFIDF, that I don't understand. Wether I use "Generate TFIDF" or "Process Documents" with this option, it seems like the most frequent words are delivered without any value at all.

I analyzed 10 documents, a couple of different sets, a couple of different setups, but i discover the same problem over and over.

image
image
image

These are screenshots from RM and MySQL. What am I doing wrong?

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "MariusHelf"
    New Altair Community Member
    it seems like the most frequent words are delivered without any value at all.
    Yes, that's the definition of TF-IDF: it applies a penalty on words which appear in only very few or almost all documents. Imagine a word which appears in all documents: it contains no information at all.

    For the exact definition of TF-IDF you could start with the wikipedia article: http://en.wikipedia.org/wiki/Tf%E2%80%93idf

    Best, Marius