Question regarding the feasibility of a Table data Extraction project with RM
pblack476
New Altair Community Member
Would it be feasible with RM to extract tables from PDFs? I realize the PDFs might be converted to something else first but would it be possible with RM to run through the entire text of a financial report and identify table data and extract it to examplesets using RM?
I am thinking of trying it out but would like to hear from more seasoned people if they think it is reasonably feasible or if there is a hard wall along the way that I am not yet seeing.
Tagged:
1
Best Answer
-
There is an operator to do this. Look for pdf extension on the marketplace.
It is fairly good with converting tables to dataset from pdf, if your tables are structured nice.
If this is not the case you can also use the import document operator from the text extension and select pdf. This will convert your pdf to plain text. Feasible to get table content then using the text operators but not so straightforward.
Finally you could also use the python extension. There are a few good libraries dealing with table extraction from pdf, but try option 1 first.1
Answers
-
There is an operator to do this. Look for pdf extension on the marketplace.
It is fairly good with converting tables to dataset from pdf, if your tables are structured nice.
If this is not the case you can also use the import document operator from the text extension and select pdf. This will convert your pdf to plain text. Feasible to get table content then using the text operators but not so straightforward.
Finally you could also use the python extension. There are a few good libraries dealing with table extraction from pdf, but try option 1 first.1 -
kudos to RM Research team for that one2