Automated tabular pdf data extraction
Hi Team,
I am using trail version of monarch v.2020 (v.16) for extracting the tabular data from pdf pages based on some keyword search for exploring purpose. We are able to get the tabular data from this version of Monarch. But I need to load and do manual things for each and every pdf for extracting tabular data. There are huge number of pdf files as input, so its tedious task do same thing again and again. Is it there any automated way in Monarch latest, so we can extract tabular data from multiple pdf pages based on some keyword search like., "ABC Statement".
Please suggest and help.
Thanks & Regards,
Ashish Deshwal
Answers
-
How are you extracting the data from PDF? Are you using templates or Table Extractor? If the PDF files are similar and you are looking to replicate what you are doing, then building out templates will work better for you than table extractor.
Once your workspace/model has been built, you can bring in multiple PDF files at the same time (limited by your system resources).
In either case, there is no automated way with just Monarch alone. We do have a companion product called Monarch Server - Automator that allows you to automate your workspaces and models. However, you cannot use Automator with Table Extractor.
0 -
Hi Chris,
Thanks for replying.
Yes we are extracting tabular data by using table extractor.
Table extractor work better by using its inbuilt functionality for data extraction and cleansing. But as you said we can not use this for repetitive purpose as just like model or template. We need to look for model approach.
Actually we have pdf files from different vendors in their separate format. So in that case we need to create model/template for each vendor. It will be helpful for when new files come for same vendor I guess.
But one thing is there, we have multiple pages in pdf file, and we search the relevant pages from which we need to extract tabular data base on the keywords. I am not sure is it possible can we incorporate that search thing in model. Can you please suggest.
Also, the tabular data in the pages not having simple format. So, its quite challenging to building a model. Can you kindly provide any video tutorial or link, where model is trained on pdf tabular data.
Best Regards,
Ashish Deshwal0 -
Ashish D said:
Hi Chris,
Thanks for replying.
Yes we are extracting tabular data by using table extractor.
Table extractor work better by using its inbuilt functionality for data extraction and cleansing. But as you said we can not use this for repetitive purpose as just like model or template. We need to look for model approach.
Actually we have pdf files from different vendors in their separate format. So in that case we need to create model/template for each vendor. It will be helpful for when new files come for same vendor I guess.
But one thing is there, we have multiple pages in pdf file, and we search the relevant pages from which we need to extract tabular data base on the keywords. I am not sure is it possible can we incorporate that search thing in model. Can you please suggest.
Also, the tabular data in the pages not having simple format. So, its quite challenging to building a model. Can you kindly provide any video tutorial or link, where model is trained on pdf tabular data.
Best Regards,
Ashish DeshwalHi Ashish,
We haven't forgotten about your request here. We're going to share a link soon that has a library of demonstrations, trainings, and how to get certified in Monarch. Stay tuned, and thanks for your patience.
Best Regards,
Baba
0