[SOLVED] File Name for Output

P-rapid
P-rapid New Altair Community Member
edited November 5 in Community Q&A
Hello,

I've been using Rapidminer for a couple of days to extract text from a .pdf exporting the analysis to an Excel file. Since I want to extract many different files at the same time I implemented a Loop Example. Within this loop I use an export routine to create an Excel file for each imported .pdf. As I would like to use the .pdf file name for the Excel file as well I implemented an Extract Macro operator using the type "data_value" and the macro iterator as the example number. I used that macro as the file name. However, it looks like the "Extract Macro" operator is not iterating over the example number but using the first .pdf name for all of the following files.

It would be great if you could tell me how to fix the issue. I am however not an expert in Rapidminer so far.

Thanks,
Tagged:

Answers

  • "Use the force Luke"

    in other words

    "Post the XML P-rapid"
  • P-rapid
    P-rapid New Altair Community Member
    Thanks for your reply, please find the code enclosed. The loop example showed below follows on an text extraction routine with creates the respective metadata. The metadata is accessible, since the the output always shows the first file name for all of the extracted reports.
    <operator activated="true" class="loop_examples" compatibility="5.3.007" expanded="true" height="76" name="Loop Examples" width="90" x="45" y="300">
           <parameter key="iteration_macro" value="example_number"/>
           <process expanded="true">
             <operator activated="true" class="extract_macro" compatibility="5.3.007" expanded="true" height="60" name="Extract Macro" width="90" x="45" y="30">
               <parameter key="macro" value="%{example_number}"/>
               <parameter key="statistics" value="count"/>
               <parameter key="attribute_name" value="metadata_file"/>
               <list key="additional_macros"/>
             </operator>
             <operator activated="true" class="extract_macro" compatibility="5.3.007" expanded="true" height="60" name="Extract Macro (2)" width="90" x="179" y="30">
               <parameter key="macro" value="%{file_name}"/>
               <parameter key="macro_type" value="data_value"/>
               <parameter key="attribute_name" value="metadata_file"/>
               <parameter key="example_index" value="%{example_number}"/>
               <list key="additional_macros"/>
             </operator>
                   <operator activated="true" class="execute_process" compatibility="5.3.007" expanded="true" height="76" name="Execute Process" width="90" x="447" y="165">
               <parameter key="process_location" value="Reporting"/>
               <parameter key="store_output" value="true"/>
               <parameter key="cache_process" value="false"/>
               <list key="macros">
                 <parameter key="file_name" value="%{file_name}"/>
                 <parameter key="example_number" value="%{example_number}"/>
               </list>
             </operator>
             <connect from_port="example set" to_op="Extract Macro" to_port="example set"/>
             <connect from_op="Extract Macro" from_port="example set" to_op="Extract Macro (2)" to_port="example set"/>
             <connect from_op="Extract Macro (2)" from_port="example set" to_op="Execute Process"  to_port="input 1"/>
             <portSpacing port="source_example set" spacing="0"/>
             <portSpacing port="sink_example set" spacing="0"/>
             <portSpacing port="sink_output 1" spacing="0"/>
           </process>
         </operator>

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.007">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.3.007" expanded="true" name="Process">
       <process expanded="true">
         <operator activated="true" class="reporting:generate_report" compatibility="5.3.000" expanded="true" height="76" name="Generate Report" width="90" x="45" y="30">
           <parameter key="report_name" value="Datei %{file_name}"/>
           <parameter key="format" value="Excel"/>
           <parameter key="excel_output_file" value="C:\Datei %{file_name}.xls"/>
         </operator>
         <operator activated="true" class="reporting:report" compatibility="5.3.000" expanded="true" height="60" name="Report" width="90" x="179" y="30">
           <parameter key="report_name" value="Datei %{file_name}"/>
           <parameter key="specified" value="true"/>
           <parameter key="reportable_type" value="Data Table"/>
           <parameter key="renderer_name" value="Data View"/>
           <list key="parameters">
             <parameter key="attribute_filter_type" value="subset"/>
             <parameter key="attributes" value="att_%{example_number}|id"/>
             <parameter key="use_except_expression" value="false"/>
             <parameter key="value_type" value="attribute_value"/>
             <parameter key="use_value_type_exception" value="false"/>
             <parameter key="except_value_type" value="time"/>
             <parameter key="block_type" value="attribute_block"/>
             <parameter key="use_block_type_exception" value="false"/>
             <parameter key="except_block_type" value="value_matrix_row_start"/>
             <parameter key="invert_selection" value="false"/>
             <parameter key="include_special_attributes" value="true"/>
             <parameter key="min_row" value="1"/>
             <parameter key="max_row" value="2147483647"/>
           </list>
         </operator>
         <connect from_port="input 1" to_op="Generate Report" to_port="through 1"/>
         <connect from_op="Generate Report" from_port="through 1" to_op="Report" to_port="reportable in"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="source_input 2" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
       </process>
     </operator>
    </process>
    Thanks a lot for your help.
  • Hello

    You need to use the name of the macro not its value in the Extract Macro operator. In other words, remove the %{} around the names.

    At the moment, the first time Extract Macro operator is called it creates a macro called "1" and sets its value to the number of examples in the example set.

    The second Extract Macro operator creates a macro equal to the value of the existing macro called file_name and sets its value to the value of the attribute called "metadata_file" for the current example within the example set. So if file_name is equal to fred.txt, a macro called fred.txt will be created and it will be set to whatever the attribute metadata_file has at the nth position.

    Macros are powerful but can be fiddly. If you use the macro view in the GUI, you will move into a whole new area of understanding.

    regards

    Andrew
  • P-rapid
    P-rapid New Altair Community Member
    Hello,

    thanks a lot. It works perfectly and helps a lot!

    Regards