"extract data from a table in a pdf-file"

currant
currant New Altair Community Member
edited November 5 in Community Q&A
Hi All,

is it possible to extract the data from a table in a pdf-file with RM? If yes, how? Has anyone some experience? Do I need xpath-experience ...?

Thanks in advance!

currant
Tagged:

Answers

  • MariusHelf
    MariusHelf New Altair Community Member
    Sorry, this is not possible with RapidMiner. However, the Text Processing extension offers you some operators to read pdf files as normal text. Select "Update RapidMiner" from the Help menu to download and install it.
    Depending on the contents and the formatting of the tables it might be possible to copy-paste the tables by hand from the pdf file to an excel sheet and use the Read Excel operator.

    Cheers, Marius