text processing pdfs

wclaster
wclaster New Altair Community Member
edited November 5 in Community Q&A
I am trying to build a word cloud from pdfs. Is there some sort of "demo" for this? Do I need to convert the pdfs to text first? I saw a video where he suggested converting to txt files and put them in a separate folder. ((92) Text Processing on Rapid Miner - YouTube)
I tried with a process (see attached xml) but I am getting gibberish for the output (see attached image). Any suggestions here? Thank you!

Best Answer

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓
    Hi,
    did you use read_document to read the pdf? it got a setting to read PDFs.

    Best,
    Martin

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓
    Hi,
    did you use read_document to read the pdf? it got a setting to read PDFs.

    Best,
    Martin
  • wclaster
    wclaster New Altair Community Member
    Thank you Martin. That did it