"Text Mining term occurrences per label value"
Hi everyone!
I started out using Rapidminer for text mining as it seems a pretty powerful tool to do so.
When using the "Process documents from data" operator I get an output called WordList which gives an overview of the different
words in the documents and a frequency of occurrence. I also set a label on the dataset and the table also shows the values
of this label as different categories for which it should give you term occurrence frequencies. However while
"document occurences" and "Total occurence" seem to be calculated correctly for every word, all the different categories just show 0 for every word.
I would expect a word like let's say "sponsor" which occurs in 10 documents to be distributed over the different categories since every document was classified
in a category.
Did I do something wrong in the data import process? Are there prerequisites I do not know about so the division of word occurrences would be shown correctly over all the values of the label
variable?
thanks in advance,
Arno
I started out using Rapidminer for text mining as it seems a pretty powerful tool to do so.
When using the "Process documents from data" operator I get an output called WordList which gives an overview of the different
words in the documents and a frequency of occurrence. I also set a label on the dataset and the table also shows the values
of this label as different categories for which it should give you term occurrence frequencies. However while
"document occurences" and "Total occurence" seem to be calculated correctly for every word, all the different categories just show 0 for every word.
I would expect a word like let's say "sponsor" which occurs in 10 documents to be distributed over the different categories since every document was classified
in a category.
Did I do something wrong in the data import process? Are there prerequisites I do not know about so the division of word occurrences would be shown correctly over all the values of the label
variable?
thanks in advance,
Arno