Calculate TFIDF
barthos
New Altair Community Member
Hello,
I would like to calculate the TF-IDF of words in different documents, in order to determine the words that are the most significant for each document.
I use the create document block for each new document and I add an attribute (a name) to each document. I then use a "documents to data" operator to generate an example set. Then I use a "process document from data" operator to compute the TF-IDF (which I selected on the parameter board).
The problem is that I don't get TF-IDF but only the number of occurences of the words and the number of documents in which they appear. Moreover, I don't see anymore the label of the document, so I am not able to distinguish the different documents.
Can somebody help me?
Thanks a lot,
Barthélémy
I would like to calculate the TF-IDF of words in different documents, in order to determine the words that are the most significant for each document.
I use the create document block for each new document and I add an attribute (a name) to each document. I then use a "documents to data" operator to generate an example set. Then I use a "process document from data" operator to compute the TF-IDF (which I selected on the parameter board).
The problem is that I don't get TF-IDF but only the number of occurences of the words and the number of documents in which they appear. Moreover, I don't see anymore the label of the document, so I am not able to distinguish the different documents.
Can somebody help me?
Thanks a lot,
Barthélémy
Tagged:
0
Answers
-
Hi Barthélémy,
it sounds like you are only looking at the wordlist output (where the word occurences are shown). But also take a look at the example set output of the "Process Documents" operator. There you will see TF-IDF values and also the document's label.
Instead of chaining "Documents to Data" and "Process Documents from Data" you can use the single operator "Process Documents" instead.
Best regards
Matthias0 -
Thanks Matthias !
Barth0