🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

The contents readed by read document operator from PDF file are disorder

User: "Lei"
New Altair Community Member
Updated by Jocelyn
I use read document operator to read a PDF file. The content order is different from the text order of the PDF file itself.

<?xml version="1.0" encoding="UTF-8"?><process version="9.10.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.10.000" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="text:read_document" compatibility="9.3.001" expanded="true" height="68" name="Read Document" width="90" x="112" y="85">
        <parameter key="file" value="D:/water electrolysis/PDF electrolyte/0019178499_#14. 1-s2.0-S0925838817304759-main1.pdf"/>
        <parameter key="extract_text_only" value="true"/>
        <parameter key="use_file_extension_as_type" value="true"/>
        <parameter key="content_type" value="txt"/>
        <parameter key="encoding" value="SYSTEM"/>
      </operator>
      <connect from_op="Read Document" from_port="output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

Extracted by read document operator: 


Original PDF file:



 The order of highlight words in above two images is totally different. 
Could someone help me to fix this problem? Thank you very much.

Find more posts tagged with

No comments on this post.