Converting PDF with check boxes
When converting a PDF with check boxes the check boxes get rendered as double quotes. The issue with that is that I eventually export the data out as delimited and the quotes are messing things up. Is there a way to have check boxes rendered as something else.
I'm using v10 pro
Answers
-
I don't recall having had to deal with check boxes in PDFs but in theory whatever is produced by the PDF interpretation engine is providing some data for you to work with as it it had started out as something like a report or a text file.
To that end Monarch provides the tools for you to include/exclude specific fields or pieces of data from its own extraction activity or, if selecting something you don't want proves unavoidable, there are tools for manipulating the extracted data to tidy it up, alter it or, in this case perhaps, eliminate it.
To do that you may need a calculated field or two (difficult to be specific without sight of the challenge) and one or more of the Functions provided in Monarch for Text manipulation.
HTH.
Grant
0 -
Altair Forum User said:
I don't recall having had to deal with check boxes in PDFs but in theory whatever is produced by the PDF interpretation engine is providing some data for you to work with as it it had started out as something like a report or a text file.
To that end Monarch provides the tools for you to include/exclude specific fields or pieces of data from its own extraction activity or, if selecting something you don't want proves unavoidable, there are tools for manipulating the extracted data to tidy it up, alter it or, in this case perhaps, eliminate it.
To do that you may need a calculated field or two (difficult to be specific without sight of the challenge) and one or more of the Functions provided in Monarch for Text manipulation.
HTH.
Grant
A quick further thought on this.
Although I don't think the PDF interpretation engines in Monarch over the years have been intended to pick out every aspect of PDF functionality (they are, after all, only interested in text sections not graphics) there is some possibility that they may have been modified from time to time to account for new functionality in PDFs that is associated with text facilities. Tick boxes for example.
From what I have been able to workout the availability of tick box functionality in PDFs may have arrived after Monarch V10 became available, thus making it unlikely to be a specifically addressed feature during extraction.
This also makes me wonder whether the displayed characters are simply substitutes for something that is unavailable in the chosen display font (i.e. a "box" of some sort)
.
If so there might be some basis for trying other fonts to see if different characters are offered. However the more complete approach that would avoid issues about variables would be to find a way to map the unwanted sections/characters out of the extraction using the model definition.
Grant
0