🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Read PDF Tables Extension - Need to

User: "miked"
New Altair Community Member
Updated by Jocelyn
Hello - I am trying to use the "Read PDF Tables" Extension. I have successfully read my PDF but it has been split out into 21 different example sets. I would like to use the "Select" operator to choose the Example sets that I need. I am running into some issues. "Select" only lets you pick on example set whereas I will need to select 5. Second - not all of the example sets are the same with only 5 of the 21 sheets having the attribute headings that I actually need. Would anyone have any ideas on how I can pull what I need from this set. I have been trying to use Loops but unsuccessfully. Thanks! 

Find more posts tagged with

Sort by:
1 - 3 of 31
    User: "miked"
    New Altair Community Member
    OP
    Accepted Answer
    Hi @sgenzer...Great thank you. That definitely helps narrow down which example sets have the attributes that I need. Would I then just follow @varunm1 method to connect the n amount of "Select" operators to Append the sets together? Is there a way of using a macro to count the example sets and just save "Select" loop n amount of times. If not..this should work for now and I thank you both for your help. 
    -Mike
    User: "sgenzer"
    Altair Employee
    Accepted Answer
    hi @miked if all the examplesets are the same (or similar), I'd just drop an Append(Superset) on the end. Like this:

    <?xml version="1.0" encoding="UTF-8"?><process version="9.6.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.6.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="-1"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="pdf_table_extraction:pdfs2exampleset_operator" compatibility="0.2.001" expanded="true" height="68" name="Read PDF Tables" width="90" x="45" y="34">
            <parameter key="resource_type" value="file"/>
            <parameter key="attribute" value=""/>
            <parameter key="tune extraction criteria" value="false"/>
            <parameter key="discard tables with no rows" value="false"/>
            <parameter key="discard empty attributes" value="false"/>
            <parameter key="heuristic ratio for table content" value="0.65"/>
            <parameter key="tune edge detection criteria" value="false"/>
            <parameter key="grayscale intensity threshold" value="25"/>
            <parameter key="minimum width of horizontal edge" value="50"/>
            <parameter key="minimum height of vertical edge" value="10"/>
            <parameter key="maximum cell corner distance" value="10"/>
            <parameter key="required text lines for edge" value="4"/>
            <parameter key="required cells for table" value="4"/>
            <parameter key="point snap distance threshold" value="8.0"/>
            <parameter key="table padding amount" value="1.0"/>
            <parameter key="identical table overlap ratio" value="0.9"/>
          </operator>
          <operator activated="true" class="loop_collection" compatibility="9.6.000" expanded="true" height="82" name="Loop Collection" width="90" x="179" y="34">
            <parameter key="set_iteration_macro" value="false"/>
            <parameter key="macro_name" value="iteration"/>
            <parameter key="macro_start_value" value="1"/>
            <parameter key="unfold" value="false"/>
            <process expanded="true">
              <operator activated="true" class="select_attributes" compatibility="9.6.000" expanded="true" height="82" name="Select Attributes" width="90" x="45" y="34">
                <parameter key="attribute_filter_type" value="all"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="attribute_value"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
                <description align="center" color="transparent" colored="false" width="126">enter the attribute of example sets you want to keep</description>
              </operator>
              <operator activated="true" class="branch" compatibility="9.6.000" expanded="true" height="82" name="Branch" width="90" x="179" y="34">
                <parameter key="condition_type" value="min_attributes"/>
                <parameter key="condition_value" value="1"/>
                <parameter key="expression" value=""/>
                <parameter key="io_object" value="ANOVAMatrix"/>
                <parameter key="return_inner_output" value="true"/>
                <process expanded="true">
                  <connect from_port="condition" to_port="input 1"/>
                  <portSpacing port="source_condition" spacing="0"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="sink_input 1" spacing="0"/>
                  <portSpacing port="sink_input 2" spacing="0"/>
                  <description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="132" y="13">keep the ExampleSet</description>
                </process>
                <process expanded="true">
                  <portSpacing port="source_condition" spacing="0"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="sink_input 1" spacing="0"/>
                  <portSpacing port="sink_input 2" spacing="0"/>
                  <description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="162" y="13">do not keep the ExampleSet</description>
                </process>
                <description align="center" color="transparent" colored="false" width="126">branch to some minimum # of attributes (1?)</description>
              </operator>
              <connect from_port="single" to_op="Select Attributes" to_port="example set input"/>
              <connect from_op="Select Attributes" from_port="example set output" to_op="Branch" to_port="condition"/>
              <connect from_op="Branch" from_port="input 1" to_port="output 1"/>
              <portSpacing port="source_single" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="operator_toolbox:advanced_append" compatibility="2.3.000" expanded="true" height="82" name="Append (Superset)" width="90" x="313" y="34"/>
          <connect from_op="Read PDF Tables" from_port="collection of pdf data tables as example sets" to_op="Loop Collection" to_port="collection"/>
          <connect from_op="Loop Collection" from_port="output 1" to_op="Append (Superset)" to_port="example set 1"/>
          <connect from_op="Append (Superset)" from_port="merged set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    



    User: "ey1"
    New Altair Community Member
    Accepted Answer
    Updated by ey1
    If you are still thinking on a way how to automate the filtering of collection, you can think about different condition types in the Branch operator in the process proposed by @sgenzer such as min or max number of attributes or examples. If you want to use names of attributes, just inspect if Read PDF Tables operator gives you the attribute names you want (its not a guarantee, since it depends on detection and extraction method) in the output ExampleSet(s) but if it does once, it will do always. In this case, you can use the attribute names in macros and try to use complex expression in Branch operator to filter out ExampleSets with desired attribute name(s) and if they have exactly same header structure, you can Append them as @varunm1 suggested.
    I am attaching a test process for reference. It will log out an error message to give a hint if condition is not fulfilled.
    Cheers,
    Edwin