Question regarding the feasibility of a Table data Extraction project with RM

pblack476
pblack476 New Altair Community Member
edited November 2024 in Community Q&A
Would it be feasible with RM to extract tables from PDFs? I realize the PDFs might be converted to something else first but would it be possible with RM to run through the entire text of a financial report and identify table data and extract it to examplesets using RM?

I am thinking of trying it out but would like to hear from more seasoned people if they think it is reasonably feasible or if there is a hard wall along the way that I am not yet seeing.
Tagged:

Best Answer

  • kayman
    kayman New Altair Community Member
    Answer ✓
    There is an operator to do this. Look for pdf extension on the marketplace.
    It is fairly good with converting tables to dataset from pdf, if your tables are structured nice. 

    If this is not the case you can also use the import document operator from the text extension and select pdf. This will convert your pdf to plain text. Feasible to get table content then using the text operators but not so straightforward.

    Finally you could also use the python extension. There are a few good libraries dealing with table extraction from pdf, but try option 1 first. 

Answers

  • kayman
    kayman New Altair Community Member
    Answer ✓
    There is an operator to do this. Look for pdf extension on the marketplace.
    It is fairly good with converting tables to dataset from pdf, if your tables are structured nice. 

    If this is not the case you can also use the import document operator from the text extension and select pdf. This will convert your pdf to plain text. Feasible to get table content then using the text operators but not so straightforward.

    Finally you could also use the python extension. There are a few good libraries dealing with table extraction from pdf, but try option 1 first. 
  • pblack476
    pblack476 New Altair Community Member
    @kayman wow. that extension just does it perfectly. Thanks very much.
  • sgenzer
    sgenzer
    Altair Employee
    kudos to RM Research team for that one  :)