🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Extract data from pdf files and perform text analysis

User: "Studentul_86"
New Altair Community Member
Updated by Jocelyn
Hello,

I'm a recent user of RapidMiner, using the free educational solution, for one academic paper I'm working on.
The problem is I did not found any possibility up-to-now to extract data for text analysis in RapidMiner from pdf files.
Can somebody help me advice me with a process or any advice on how I can extract in RapidMiner text from multiple pdf files at once and reach this way my target of counting words?

Also, related to sentiment analysis of texts, can somebody give me hints on free solutions in RapidMiner to perform?

Thank you.
Best regards,

Valentin.

Find more posts tagged with

Sort by:
1 - 3 of 31
    User: "MartinLiebig"
    Altair Employee
    Accepted Answer
    Hi,
    The Read Document has an option to read pdfs. You want to combine this with a loop files operator.

    Best,
    Martin
    User: "lionelderkrikor"
    New Altair Community Member
    Accepted Answer
    Hi Vali,

    I'm not sure what you are looking for , thus I propose 2 options based on Martin's idea : 

     - Process 1 (in attached file) : Read Document inside a Loop Files operator, then a Process Documents operator
     - process 2 (in attached file)  : Read Document inside a Loop Files operator, then a Combine Documents operator, then a Process Documents operator.

    Tell us if one of these processes answers to your request...If not can you elaborate what you want to achieve ?

    Regards,

    Lionel
    User: "lionelderkrikor"
    New Altair Community Member
    Accepted Answer
    Hi Vali,

    It seems that your Loop Files is not correctly set.
    Please import the second process (Loop_read_pdfs_documents.rmp) I shared in my previous post and set in the parameters of the Loop Files operator the path where the PDFs files are stored in your case.

    Regards,

    Lionel