"Text mining"

Andreas_M_
Andreas_M_ New Altair Community Member
edited November 5 in Community Q&A
Hi,

I 'm new to Rapidminer and I don't quite cope with it yet.
What I want to do: I have about 300 pdf documents and one wordlist with about 100 different words. I want to find out the total occurrency of these words for each pdf document. And I would like to know the total number of words each pdf ducument contains.

Can somebody help me with modelling the process?

Thanks in anvance.

Answers

  • Freddie2310
    Freddie2310 New Altair Community Member
    Hello,

    Did you finally find the process ? Would you please share it ?
    I have the same concerns with many pdf documents.

    Thank you
  • MartinLiebig
    MartinLiebig
    Altair Employee
    Hi,
    the operator to read pdf files is read Document. You can combine that with Loop Files to read several files.

    Best,
    Martin
  • M700760
    M700760 New Altair Community Member