Brand new to this and boy do I have questions
I am looking to extract information from "investment type" pdf statements These will look similar to statement you may receive from any of the large brokerage firms out there. Each statement has data that we would extract to excel. But the data may be in one than more lines (for example a description of a bond or other type of investments that usually provides details.
Also each monthly statements would need to be passed three times in order to obtain the tables I need?
For example
1. Asset Allocation
2. Valuation (prior balance and balance this period)
3. Daily Activity
I could add
4. Portfolio holding etc.
Any ideas where to start learning this at a QUICK speed?
Thanks
FS
Answers
-
Hi Esdras,
I would like to share a link to a "How To" Video that may provide some assistance on what you may be looking to accomplish. How to Quickly Convert a PDF to Excel
-R
0 -
Altair Forum User said:
Hi Esdras,
I would like to share a link to a "How To" Video that may provide some assistance on what you may be looking to accomplish. How to Quickly Convert a PDF to Excel
-R
THANKS REBECCA
We have been able to extract some data. We have been trying to set the “traps” in advanced mode as well as capture additional information from the same PDF without having to create n additionaltemplate but it is challenging.
0 -
Altair Forum User said:
THANKS REBECCA
We have been able to extract some data. We have been trying to set the “traps” in advanced mode as well as capture additional information from the same PDF without having to create n additionaltemplate but it is challenging.
Hi Esdras,
I too am learning. I did find this other Video from the Datawatch YouTube channel regarding Advanced Mode & Traps. Maybe this may provide some tips. Datawatch Monarch Personal Tutorial | 5 Adding Data from a Report - Advanced Mode - YouTube
-R
0 -
Hi Esdras and Rebecca,
When dealing with PDF files an early useful question to consider is how well the content you need to extract from the PDF behaves (does it retain a "report structure" well or is it erratic) and how consistent is it form one report to the next?
The answers are very likely to influence the way you approach the extraction decisions if you are seeking to create a model or workspace to be re-used with no (or at least very minimal) interaction time after time.
If the consistency of the PDF sources look good the next thing is to decide how the data structure "patterns" compare.
A document presented as a Statement most often does not have the same sort of repeating structure of content that a typical operational report will have. Therefore, although it still makes sense to check for a usable repeating pattern to model, the chances are that a few variations may be required.
Then we come to the question about "what is a record?"
In order to nicely present information onto pieces of paper in a format that humans like to read (e.g. Legal/Foolscap/A4) reports often use 2 or more lines to present data that computers (and Excel) would like to see in one line. If the data for a "Single detail record" is reported on more than one line it is likely that Monarch Classic may become the modelling tool of choice.
Once you have "seen" the data and assuming you know how it eventually needs to look wherever it is that you are planning to send it after extraction, it becomes relatively easy to take the basic concepts and cover the best part of the functionality you need. But there are likely to be a few special tweaks required for something like a PDF based Statement document (there always are in my experience) that is where the process can be a little more "interesting" when making it as efficient and effective as possible on a case by case basis.
I hope this helps. I think having to dive in to PDF files early in you acquaintance with Monarch (or indeed any other PDF reading and representing application) can easily be baffling in some ways and sometimes completely baffling.
The worst examples seem often to be those that look like they come close to giving a good result straight from the start but have a few imperfections. Sometimes the small imperfections can be easily dealt with. Other times they are a symptom of a deeper problem with the way the PDF file is being created and handling its internal data and a more drastic approach is required. Spotting this at an early stage can help to avoid a lot of frustrating effort that ends up going nowhere really useful.
Very powerful and effective results are always possible but in the more difficult cases there is no step by step guide that is specific yet works for all.
However, once you have a working model or workspace you have a self documenting set of tools that can be adopted and redeployed many times for future challenges.
I hope that helps a little.
For more incisive guidance it is almost invariably beneficial to be able to work with representative or substantially representative samples of the reports that need to be modelled. I recognise that this may raise privacy and security concerns but often such a very specific experience is the only way to identify how to ensure a successful outcome consistently if such a result is not forthcoming at an early stage of modelling.
Grant
0 -
Altair Forum User said:
Hi Esdras and Rebecca,
When dealing with PDF files an early useful question to consider is how well the content you need to extract from the PDF behaves (does it retain a "report structure" well or is it erratic) and how consistent is it form one report to the next?
The answers are very likely to influence the way you approach the extraction decisions if you are seeking to create a model or workspace to be re-used with no (or at least very minimal) interaction time after time.
If the consistency of the PDF sources look good the next thing is to decide how the data structure "patterns" compare.
A document presented as a Statement most often does not have the same sort of repeating structure of content that a typical operational report will have. Therefore, although it still makes sense to check for a usable repeating pattern to model, the chances are that a few variations may be required.
Then we come to the question about "what is a record?"
In order to nicely present information onto pieces of paper in a format that humans like to read (e.g. Legal/Foolscap/A4) reports often use 2 or more lines to present data that computers (and Excel) would like to see in one line. If the data for a "Single detail record" is reported on more than one line it is likely that Monarch Classic may become the modelling tool of choice.
Once you have "seen" the data and assuming you know how it eventually needs to look wherever it is that you are planning to send it after extraction, it becomes relatively easy to take the basic concepts and cover the best part of the functionality you need. But there are likely to be a few special tweaks required for something like a PDF based Statement document (there always are in my experience) that is where the process can be a little more "interesting" when making it as efficient and effective as possible on a case by case basis.
I hope this helps. I think having to dive in to PDF files early in you acquaintance with Monarch (or indeed any other PDF reading and representing application) can easily be baffling in some ways and sometimes completely baffling.
The worst examples seem often to be those that look like they come close to giving a good result straight from the start but have a few imperfections. Sometimes the small imperfections can be easily dealt with. Other times they are a symptom of a deeper problem with the way the PDF file is being created and handling its internal data and a more drastic approach is required. Spotting this at an early stage can help to avoid a lot of frustrating effort that ends up going nowhere really useful.
Very powerful and effective results are always possible but in the more difficult cases there is no step by step guide that is specific yet works for all.
However, once you have a working model or workspace you have a self documenting set of tools that can be adopted and redeployed many times for future challenges.
I hope that helps a little.
For more incisive guidance it is almost invariably beneficial to be able to work with representative or substantially representative samples of the reports that need to be modelled. I recognise that this may raise privacy and security concerns but often such a very specific experience is the only way to identify how to ensure a successful outcome consistently if such a result is not forthcoming at an early stage of modelling.
Grant
That’s quite helpful and very detailed. The information is somewhat erratic and I don’t have an issue with “sharing” the document “clean of any privacy concerns” if that would be helpful.
Would this be done via direct message?
Sent from my iPhone
0 -
Altair Forum User said:
That’s quite helpful and very detailed. The information is somewhat erratic and I don’t have an issue with “sharing” the document “clean of any privacy concerns” if that would be helpful.
Would this be done via direct message?
Sent from my iPhone
Esdras,
I think sharing a full document, highly recommended compared to screen captures by the way, is by far the best way forward for a rapid development approach in all situations.
It usually saves a lot of time otherwise taken bouncing questions back and forth.
I'm not aware of a good way to make that happen via the community (although there may be one we could be told about).
Direct contact (and so limited exposure of any sort for the data) is to be preferred in my opinion.
Grant
0 -
Altair Forum User said:
Esdras,
I think sharing a full document, highly recommended compared to screen captures by the way, is by far the best way forward for a rapid development approach in all situations.
It usually saves a lot of time otherwise taken bouncing questions back and forth.
I'm not aware of a good way to make that happen via the community (although there may be one we could be told about).
Direct contact (and so limited exposure of any sort for the data) is to be preferred in my opinion.
Grant
Hello Grant & Esdras,
Let me look into creating a private area.
-Rebecca
0