PDF encoding issue

limegreenman900
limegreenman900 New Altair Community Member
edited November 2024 in Community Q&A

Hi everyone,

 

I was trying to do the most simple one can do, by reading a PDF file into RM.... I have done this several times before, but now I am stuck with (I suspect) an encoding issue.

After using the "Read Document" Operator (extract text only and use file extension as type are tick-marked) I inserted a breakpoint, before I do some preprocessing of the text. However I don't get any text out of my PDF, what I get instead is something like:


¨ÉøC&13#s$ó/Y¢¬–¬³ÙÜìâì=ÙOsbsúrnåºç&#26;sOæ1óŠòvç=Ë�ËïÏŸ\ä»hÙ¢ó&#5;Ö&#5;ê‚#…¤Â¼Â�…³‹ã&#23;oZ<]&#20;TÔUt}‰`IÃ’sK­—V-ý¤˜Y,+>TB(É/ÙSòƒ,]6*›-•–¾W:#—È7Ë&#31;*¢&#21;&#3;Š&#7;Ê&#8;e¿ò^YDY&#127;Ù}U„j£êAyTù`ù#µD=¬þ¶"©b{ųÊôÊ&#15;+&#127;¬Ê¯: !kJ4Gµ&#28;m¥ötµ}uCõ%�—®K7Y&#19;V³©fFŸ¢ßY&#11;Õ.©=bàá?S&#23;ŒîƕƩºÈº‘ºçõyõ‡&#26;Ø
Ú†&#11;�ž�k&#26;ï5%4ý¦&#25;m–7Ÿlqlio™Z&#22;³lG+ÔZÚz²Í¹­³mzyâò]íÔöÊö?uøuôw|¿"&#127;űN»Îå�wW&®ÜÛe֥ﺱ*|ÕöÕèjõê‰5&#1;k¶¬yÝ­èþ¢Ç¯g°ç‡^yï&#23;kEk‡Öþ¸®lÝD_pß¶õÄõÚõ×7DmØÕÏîoê¿»1mãá&#1;l {àûMÅ›Î
&#6;&#14;nßLÝlÜ<9”úO

Anyone an idea where the problem is? I would suggest that it is an encoding issue?!

 

If I go into the PDF file and Copy+Paste the text into a Word File there is no problem and the text is displayed in a correct manner....

Tagged:

Welcome!

It looks like you're new here. Sign in or register to get started.

Answers

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    You can change the encoding on the Read Documents operator. Just enable the advanced settings and a new parameter box will show up in the parameter window. From there you can change the encoding. 

  • limegreenman900
    limegreenman900 New Altair Community Member

    I am working with RM5.3, so by displaying the "Read Document" operator encoding is set by default to "System". This should automatically match the correct encoding right?

  • MartinLiebig
    MartinLiebig
    Altair Employee

    Hi,

     

    usually it is. If you have a UTF file on a windows machine it might not work. So I would give it a try with UTF-8.

     

    ~Martin

  • limegreenman900
    limegreenman900 New Altair Community Member

    @mschmitz: I gave it a try with UTF, but it didn't work. I'll figure out another way, somehow it has to work.

    Nevertheless, thanks for your help.

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.