Problem with Loop Files

ripkars
ripkars New Altair Community Member
edited November 5 in Community Q&A
Hello everybody

I'm willing to write a process whose aim is reading all csv files from a directory and perform the very same operation on them.

I have this problem with the Loop Files operator and its subprocess.

The Loop operator looks like this
Loop [Filter: '*.csv', Directory: /home/riccardo/Workspace/unrealtournament3-dmtm2010/Training Data, File Name Macro: file_name File Path Macro: file_path etc] (the directory is made of two parts....could it be a problem? I also tried to rename it to Training_Data but hadn't got any success ... )

Inside it I have put a Read CSV operator where the File Name is set to %{file_path} (and another operator just for the sake of connecting the output somewhere).

The error I get is:
Cannot create example set meta data: Could not read file 'null': /home/riccardo/file_path (No such file or directory)..

Shouldn't RapidMiner set the value at runtime for each of the CSV file in that directory?

Why is this process broken??

(Please answer me asap as I need to finish this work by today, 23:59 UTC +01:00)

Answers

  • ripkars
    ripkars New Altair Community Member
    I try to write the same file to xrff format....not working
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <parameter key="logfile" value="/home/riccardo/Workspace/unrealtournament3-dmtm2010/Processes/Log/Loop.log"/>
        <parameter key="resultfile" value="/home/riccardo/resulloop.res"/>
        <process expanded="true" height="632" width="1044">
          <operator activated="true" class="loop_files" expanded="true" height="60" name="Loop Files" width="90" x="108" y="92">
            <parameter key="directory" value="/home/riccardo/Workspace/unrealtournament3-dmtm2010/Training Data"/>
            <parameter key="filter" value="'*.csv'"/>
            <process expanded="true" height="650" width="1062">
              <operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV" width="90" x="246" y="75">
                <parameter key="file_name" value="%{file_path}.csv"/>
              </operator>
              <operator activated="true" class="write_xrff" expanded="true" height="60" name="Write XRFF" width="90" x="380" y="75">
                <parameter key="example_set_file" value="/home/riccardo/Workspace/unrealtournament3-dmtm2010/Training Data/pippo.xrff"/>
              </operator>
              <connect from_op="Read CSV" from_port="output" to_op="Write XRFF" to_port="input"/>
              <portSpacing port="source_in 1" spacing="0"/>
            </process>
          </operator>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
    </process>
  • haddock
    haddock New Altair Community Member
    Hi there,

    The devil is always in the detail, it was the regex '*.csv' in this case .  The following logs my files..
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.0" expanded="true" name="Process">
        <process expanded="true" height="632" width="1044">
          <operator activated="true" class="loop_files" compatibility="5.0.0" expanded="true" height="76" name="Loop Files" width="90" x="108" y="92">
            <parameter key="directory" value="C:\Documents and Settings\Alien\My Documents\rm_workspace"/>
            <parameter key="filter" value=".*csv"/>
            <parameter key="iterate_over_subdirs" value="true"/>
            <process expanded="true" height="296" width="705">
              <operator activated="true" class="provide_macro_as_log_value" compatibility="5.0.8" expanded="true" height="76" name="Provide Macro as Log Value" width="90" x="179" y="30">
                <parameter key="macro_name" value="file_path"/>
              </operator>
              <operator activated="true" class="provide_macro_as_log_value" compatibility="5.0.8" expanded="true" height="76" name="Provide Macro as Log Value (2)" width="90" x="380" y="30">
                <parameter key="macro_name" value="file_name"/>
              </operator>
              <operator activated="true" class="log" compatibility="5.0.8" expanded="true" height="76" name="Log" width="90" x="585" y="30">
                <list key="log">
                  <parameter key="path" value="operator.Provide Macro as Log Value.value.macro_value"/>
                  <parameter key="name" value="operator.Provide Macro as Log Value (2).value.macro_value"/>
                </list>
              </operator>
              <connect from_port="in 1" to_op="Provide Macro as Log Value" to_port="through 1"/>
              <connect from_op="Provide Macro as Log Value" from_port="through 1" to_op="Provide Macro as Log Value (2)" to_port="through 1"/>
              <connect from_op="Provide Macro as Log Value (2)" from_port="through 1" to_op="Log" to_port="through 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
            </process>
          </operator>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
    </process>
    So now you have time for a splendid dinner as well!

    Ciao.
  • ripkars
    ripkars New Altair Community Member
    Thank you very much for your interest! Now it works!
  • haddock
    haddock New Altair Community Member
    Nice one! Have fun..
  • cherokee
    cherokee New Altair Community Member
    Hi!

    haddock, your solution is as always correct (and fast). Nevertheless I'm a bit confused. Of course the regexp "*.csv" does not express what is intended but isn't it also not well-formed. The star at the beginning is the problem; what is to be present zero or more times? Shouldn't there be some kind of MalformedRegExpException (I'm not shure of the correct name right now)?

    Best regards,
    chero
  • haddock
    haddock New Altair Community Member
    Greets Chero,

    As I see it *.csv would choke the parrot, because, as you say,  * has to follow what it can repeat, but '*.csv' ( notice the single quotes ) would not. I use RegexBuddy for all this regex stuff ( brill ), about which I understand zippo!

    Ciao!

  • cherokee
    cherokee New Altair Community Member
    Hi haddock,

    of course you are right. I missed the single quotes  :-[

    Best regards,
    chero
  • cthiel
    cthiel New Altair Community Member
    ripkars wrote:

    The error I get is:
    Cannot create example set meta data: Could not read file 'null': /home/riccardo/file_path (No such file or directory)..

    Shouldn't RapidMiner set the value at runtime for each of the CSV file in that directory?
    Coming back to the original post: why does RM not replace the name of the macro with the content?

    I'm running into this issue in plenty of places, see thread at
    http://rapid-i.com/rapidforum/index.php/topic,2304.0.html

    Oddly, my processes all function, but I get plenty of "Cannot create example set meta data".

    Debugging this error class since 5+ hours... Any help would be appreciated!

    Christian
  • land
    land New Altair Community Member
    Hi,
    as I already wrote in another thread: Macros are only evaluated during run time, because they are assigned only by the execution of the respective operators. Unfortunately their value simply can't be known during execution time! Hence they can't be replaced with their values during meta data transformation and this might result in errors.
    You can't solve anything without taking a look at the actual data...

    Greetings,
      Sebastian