"Analytics with RapidMiner Rosette [getting started]"

User: "ty"
New Altair Community Member
Updated by Jocelyn

Hi,

I'm just getting started with RM for text analytics. Everything has gone well working with structured data but I'm struggling with analysing text documents. Could you anyone provide a process of how to extract entities from a PDF or Word Doc? 

 

I've searched these forums and Google and the only solution that seems to work is converting the file into a txt file first, which isn't ideal.

 

Any help would be super appreciated.

Sort by:
1 - 1 of 11
    User: "Telcontar120"
    New Altair Community Member
    Accepted Answer

    I am afraid that converting the files first is the easiest option available to you with existing operators.  Another option would be to import your document text into a database first using a database program like MySQL and then use "Read Database."  But RapidMiner won't read Word Docs or PDF text directly.