What would make a pdf open up with symbols and random letters rather than the actual data?

Leeann_21898
Leeann_21898 Altair Community Member
edited March 2021 in Community Q&A

I'm using Altair Monarch Classic and two times recently now I have tried to open pdf reports and instead of the data, random symbols and letters show up and my models won't work. Has something changed in my settings to cause this?

image

Tagged:

Best Answer

  • CPorthouse
    CPorthouse
    Altair Employee
    edited February 2021 Answer ✓

    The text layer also might have been damaged somehow.  While searching through our support database I came across a few suggestions:

    A quick and easy way to check to see if any text actually exists in a PDF file is to open it in Adobe Acrobat and use the Find feature to search for some text you can plainly see on screen. If the text is not found, the text layer has been damaged or does not exist, in which case the document is most likely an image and is therefore unreadable by Monarch or Acrobat.

    Another test is to use the text extract tool in Acrobat. Copy some text and then paste it into Notepad. (Note: If the text extract tool fails to highlight any text when you left-click and drag over it, then the text you can see on screen is an image.) If the text you pasted into Notepad is not the same as the text you can see on the page of the PDF file, then the text layer is damaged.

    The following are common scenarios during which Monarch may not be able to import a particular PDF document, as well as some suggestions on handling them.

    Scanned PDF Files
    If a PDF file contains no text, it may actually be a scanned image or some other embedded image. A scanned image is a picture of a document, taken by a scanner, which is then embedded into a PDF document. Monarch cannot extract text from a picture. The only way to deal with images is to use OCR (optical character recognition) software to try and recognize and extract text from them. CAUTION: It is NOT recommended that OCR software be used with critical financial documents, due to the fact that the extraction accuracy varies with each document and the OCR software being used. It is very easy for small errors in the recognition to creep in when using OCR software, which may not be noticed until a review or audit of the data is performed.

    Damaged PDF Files
    Even if a PDF file may appear correctly in Adobe Acrobat, during the creation process the text layer may have become damaged beyond repair, the result being that Monarch is unable to extract text from it. Adobe Acrobat is able to detect and repair many small errors in PDF documents, so opening the offending PDF file in Acrobat and using the File > Save As menu option to re-save it as a new PDF file may correct the problem.

    Text Extraction Prohibition
    When a PDF file is published, there are security options that can be specified to prevent the extraction of content from it. When you attempt to import a PDF document for which content extraction has been prohibited, Monarch will issue a message "Cannot import from PDF file because it does not allow text extraction". If this occurs, you will have to ask the publisher of the PDF file to republish it for you, and to allow content extraction when doing so.

Answers

  • Steve_Caiels
    Steve_Caiels
    Altair Employee
    edited February 2021

    Hi Leeann,

    Could they be image based PDF files?  Normally, you would expect to see nothing in the Monarch window, but I guess it is possible that the encoding could have some 'hidden' formatting text that is not visible in the PDF viewer, but is visible to Monarch.

    Please try to select the text using you preferred PDF viewer, then copy and paste it into Notepad.  If the text is readable, even if the alignment is bad, then Monarch has a good chance of working with the file.

    If you see 'garbage' or nothing at all in notepad, then I'm afraid Monarch will almost certainly not be able to extract the data.  In this case, you may be able to run the PDF through an OCR package such as ABBYY FineReader or one of the free online services to create a text searchable PDF file.

     

    Regards,

    Steve.

  • CPorthouse
    CPorthouse
    Altair Employee
    edited February 2021 Answer ✓

    The text layer also might have been damaged somehow.  While searching through our support database I came across a few suggestions:

    A quick and easy way to check to see if any text actually exists in a PDF file is to open it in Adobe Acrobat and use the Find feature to search for some text you can plainly see on screen. If the text is not found, the text layer has been damaged or does not exist, in which case the document is most likely an image and is therefore unreadable by Monarch or Acrobat.

    Another test is to use the text extract tool in Acrobat. Copy some text and then paste it into Notepad. (Note: If the text extract tool fails to highlight any text when you left-click and drag over it, then the text you can see on screen is an image.) If the text you pasted into Notepad is not the same as the text you can see on the page of the PDF file, then the text layer is damaged.

    The following are common scenarios during which Monarch may not be able to import a particular PDF document, as well as some suggestions on handling them.

    Scanned PDF Files
    If a PDF file contains no text, it may actually be a scanned image or some other embedded image. A scanned image is a picture of a document, taken by a scanner, which is then embedded into a PDF document. Monarch cannot extract text from a picture. The only way to deal with images is to use OCR (optical character recognition) software to try and recognize and extract text from them. CAUTION: It is NOT recommended that OCR software be used with critical financial documents, due to the fact that the extraction accuracy varies with each document and the OCR software being used. It is very easy for small errors in the recognition to creep in when using OCR software, which may not be noticed until a review or audit of the data is performed.

    Damaged PDF Files
    Even if a PDF file may appear correctly in Adobe Acrobat, during the creation process the text layer may have become damaged beyond repair, the result being that Monarch is unable to extract text from it. Adobe Acrobat is able to detect and repair many small errors in PDF documents, so opening the offending PDF file in Acrobat and using the File > Save As menu option to re-save it as a new PDF file may correct the problem.

    Text Extraction Prohibition
    When a PDF file is published, there are security options that can be specified to prevent the extraction of content from it. When you attempt to import a PDF document for which content extraction has been prohibited, Monarch will issue a message "Cannot import from PDF file because it does not allow text extraction". If this occurs, you will have to ask the publisher of the PDF file to republish it for you, and to allow content extraction when doing so.

  • Leeann_21898
    Leeann_21898 Altair Community Member
    edited February 2021

    I tried those suggestions and it does look like it is an image file. The same report last month worked and now something has changed. Now to figure out who/what made the change. It's good to know I didn't mess up something on my Monarch though. 

    Thanks!

  • Steve_Caiels
    Steve_Caiels
    Altair Employee
    edited February 2021

    Hopefully, there will be an option that mentions "text searchable" or something along those lines in whatever application is creating the PDF.  If it is being created via a printer driver, anything that mentions 'PostScript' should be avoided.

  • Mahmoud
    Mahmoud
    Altair Employee
    edited February 2021

    What is the Monarch Classic bitness?

    I recently opened a PDF report in Monarch Classic v15.5 or 16.1, 32 bit.  When I changed the stretch 7.2+, and Snap Text the Classic opened the PDF report as if its a text report.  When I tried it in Monarch classic 64bit, with the same settings it worked file.

    If your Classic is 32bit, try installing Monarch 64bit and open the PDF report.

    Note that if you have MS office 32bit on your PC, then you cannot install Monarch 64bit, unless you uninstall Office 32bit and install office 64bit, then you can install Monarch 64bit.

    If there is no office on a PC, then you can choose which bitness to be used.

    Regards

    Mo  

  • Leeann_21898
    Leeann_21898 Altair Community Member
    edited February 2021

    I was able to do a Save As with text searchable and now it works so THANK YOU!

    It is 64 bit. I stretched it 7.2 and beyond and that didn't change the symbols. I couldn't figure out what snap text was.

  • Rebecca_Cronin
    Rebecca_Cronin
    Altair Employee
    edited March 2021

    I was able to do a Save As with text searchable and now it works so THANK YOU!

    It is 64 bit. I stretched it 7.2 and beyond and that didn't change the symbols. I couldn't figure out what snap text was.

    Hi Leeann, 

     

    I noticed that you mentioned you "couldn't figure out what snap text was", and I wanted to include link here to the online web guide that provides the description of the various PDF Options. 

    Monarch Help Guide - PDF Options (Classic Mode)

    Monarch Help Guide - PDF Option (DPS Mode)