Question regarding the feasibility of a Table data Extraction project with RM

Would it be feasible with RM to extract tables from PDFs? I realize the PDFs might be converted to something else first but would it be possible with RM to run through the entire text of a financial report and identify table data and extract it to examplesets using RM?

I am thinking of trying it out but would like to hear from more seasoned people if they think it is reasonably feasible or if there is a hard wall along the way that I am not yet seeing.

Find more posts tagged with

AI Studio

Accepted answers

kayman

There is an operator to do this. Look for pdf extension on the marketplace.
It is fairly good with converting tables to dataset from pdf, if your tables are structured nice.

If this is not the case you can also use the import document operator from the text extension and select pdf. This will convert your pdf to plain text. Feasible to get table content then using the text operators but not so straightforward.

Finally you could also use the python extension. There are a few good libraries dealing with table extraction from pdf, but try option 1 first.

All comments

kayman

pblack476

@kayman wow. that extension just does it perfectly. Thanks very much.

sgenzer

kudos to RM Research team for that one