"pdf tokenization (?)"

Question

Hello guys,
I am totally new here and to the rapidminer!!
I have an assignment to get done so there is not much time for me to explore rapid miner. I will set my question here and I hope I will find the answer. It might be trivial.I apologise for that..

I have several pdf files. I want to tokenize them, i.e to see the multiple appearances of each word and how many times each word appears..
For example let's assume that in a pdf there is the word "process"..I want to see how many times this word appears. And that is what I want to do for all the words in the pdf file. Is tokenization what I need to do? If yes, how do I do it? If not what do you propose?
Thank you in advance!

margkw · Answer

THANKS!I will be back with more questions! :D

MariusHelf · Answer

Hi,

these are very important concepts which are rather easy to understand, but hard to explain here in text form. I would like to forward you to our video tutorials on our website; there is one complete section about text processing.

You'll find the link to the tutorials in the post linked in my signature.

Happy Mining!
  -Marius

margkw · Answer

It's me again!How can I insert the tokenize operator inside Process Documents?

And the process output should be what?

Sorry for the stupid questions..I am completely new to this..