PDF encoding issue
Hi everyone,
I was trying to do the most simple one can do, by reading a PDF file into RM.... I have done this several times before, but now I am stuck with (I suspect) an encoding issue.
After using the "Read Document" Operator (extract text only and use file extension as type are tick-marked) I inserted a breakpoint, before I do some preprocessing of the text. However I don't get any text out of my PDF, what I get instead is something like:
¨ÉøC&13#s$ó/Y¢¬–¬³ÙÜìâì=ÙOsbsúrnåºçsOæ1óŠòvç=Ë�ËïÏŸ\ä»hÙ¢óÖê‚#…¤Â¼Â�…³‹ãoZ<]TÔUt}‰`IÃ’sK—V-ý¤˜Y,+>TB(É/ÙSòƒ,]6*›-•–¾W:#—È7Ë*¢ŠÊe¿ò^YDYÙ}U„j£êAyTù`ù#µD=¬þ¶"©b{ųÊôÊ+¬Ê¯: !kJ4Gµm¥ötµ}uCõ%�—®K7YV³©fFŸ¢ßYÕ.©=bàá?SŒîÆ•Æ©ºÈº‘ºçõyõ‡Ø
Ú†�ž�kï5%4ý¦m–7Ÿlqlio™Z³lG+ÔZÚz²Í¹³mzyâò]íÔöÊö?uøuôw|¿"űN»Îå�wW&®ÜÛe֥ﺱ*|ÕöÕèjõê‰5k¶¬yÝèþ¢Ç¯g°ç‡^yïkEk‡Öþ¸®lÝD_p߶õÄõÚõ×7DmØÕÏîoê¿»1mãál {àûMÅ›Î
nßLÝlÜ<9”úO
Anyone an idea where the problem is? I would suggest that it is an encoding issue?!
If I go into the PDF file and Copy+Paste the text into a Word File there is no problem and the text is displayed in a correct manner....
Answers
-
You can change the encoding on the Read Documents operator. Just enable the advanced settings and a new parameter box will show up in the parameter window. From there you can change the encoding.
0 -
I am working with RM5.3, so by displaying the "Read Document" operator encoding is set by default to "System". This should automatically match the correct encoding right?
0 -
Hi,
usually it is. If you have a UTF file on a windows machine it might not work. So I would give it a try with UTF-8.
~Martin
1 -
@mschmitz: I gave it a try with UTF, but it didn't work. I'll figure out another way, somehow it has to work.
Nevertheless, thanks for your help.
0