We
are analysing some business annual reports (13 reports in pdf format). We are new in using
Rapidminer, but thanks to the training resources and the answers in the
community we managed to run a cluster analysis of some parts of the annual reports
we are interested in. In this kind of analysis we used the operator Process operator
documents from files to extract the
words, which are then used by the clustering operator.
Now we are interested in a different analysis, since we do not want Rapidminer to extract the list of word from the reports, but we have already a given wordlist, since we want to check if a
list of given indicators (words) are mentioned or not in the business reports. However,
I have not seen any example to learn how to create a process to get this
result. I would be very grateful if you could help me by giving some example or
indication of the operators to be used.
Thanks