[SOLVED] Unable to filter when using looped files operator

jb1376
jb1376 New Altair Community Member
edited November 2024 in Community Q&A
I'm trying to use Loop Files operators to iterate over a directory and filter the file names by a six digit string that represents they year and month (YYYYMM). I am able to assemble the full list of .csv files but my attempts to filter the list using the regex option aren't working. The regex expression I'm using is _2015. The Edit Regular Expression window is showing it's valid but isn't working correctly when the full process runs. The basic down and dirty is the following:

Wanted results:
filter: '_201512'
folder/subfolder/file_name_20151201.csv
folder/subfolder/file_name_20151202.csv
folder/subfolder/file_name_20151203.csv
...

What i'm getting:
folder/subfolder/file_name_20151201.csv
folder/subfolder/file_name_20151202.csv
folder/subfolder/file_name_20151203.csv
older/subfolder/file_name_20160101.csv
folder/subfolder/file_name_20160102.csv
folder/subfolder/file_name_20160103.csv
...

I'm new to RapidMiner and probably something simple. Here is the xml of what I've set up. The archiving and write operators are working when I have no filter in place, but when I add a regex I get a zip folder with nothing in it.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="7.0.000">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
   <process expanded="true">
     <operator activated="true" class="loop_files" compatibility="6.4.000" expanded="true" height="82" name="Loop Files" width="90" x="112" y="187">
       <parameter key="directory" value="C:\Users\jb1376\Desktop\test\"/>
       <parameter key="recursive" value="true"/>
       <process expanded="true">
         <operator activated="true" class="open_file" compatibility="7.0.000" expanded="true" height="68" name="Open File" width="90" x="514" y="34">
           <parameter key="filename" value="%{file_path}"/>
         </operator>
         <connect from_op="Open File" from_port="file" to_port="out 1"/>
         <portSpacing port="source_file object" spacing="0"/>
         <portSpacing port="source_in 1" spacing="0"/>
         <portSpacing port="sink_out 1" spacing="0"/>
         <portSpacing port="sink_out 2" spacing="0"/>
       </process>
     </operator>
     <operator activated="true" class="create_archive_file" compatibility="7.0.000" expanded="true" height="68" name="Create Archive File" width="90" x="246" y="34"/>
     <operator activated="true" class="add_entry_to_archive_file" compatibility="7.0.000" expanded="true" height="103" name="Add Entry to Archive File" width="90" x="447" y="136"/>
     <operator activated="true" class="write_file" compatibility="7.0.000" expanded="true" height="68" name="Write File" width="90" x="648" y="136">
       <parameter key="filename" value="C:\Users\jb1376\Desktop\201512.zip"/>
     </operator>
     <connect from_op="Loop Files" from_port="out 1" to_op="Add Entry to Archive File" to_port="file input 1"/>
     <connect from_op="Create Archive File" from_port="archive file" to_op="Add Entry to Archive File" to_port="archive file"/>
     <connect from_op="Add Entry to Archive File" from_port="archive file" to_op="Write File" to_port="file"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
   </process>
 </operator>
</process>
Any help is appreciated.

Answers

  • Marco_Boeck
    Marco_Boeck New Altair Community Member
    Hi,

    the filter will only accept a file if it matches the full file name. In your case, your regex matches only a part of it. I have modified your process by changing the regex a bit and now it should work:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.1.000-SNAPSHOT">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="loop_files" compatibility="6.4.000" expanded="true" height="82" name="Loop Files" width="90" x="112" y="187">
            <parameter key="directory" value="C:\Users\jb1376\Desktop\test\"/>
            <parameter key="filter" value=".*_201512.*.csv"/>
            <parameter key="recursive" value="true"/>
            <process expanded="true">
              <operator activated="true" class="open_file" compatibility="7.1.000-SNAPSHOT" expanded="true" height="68" name="Open File" width="90" x="514" y="34">
                <parameter key="filename" value="%{file_path}"/>
              </operator>
              <connect from_op="Open File" from_port="file" to_port="out 1"/>
              <portSpacing port="source_file object" spacing="0"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="create_archive_file" compatibility="7.1.000-SNAPSHOT" expanded="true" height="68" name="Create Archive File" width="90" x="246" y="34"/>
          <operator activated="true" class="add_entry_to_archive_file" compatibility="7.1.000-SNAPSHOT" expanded="true" height="103" name="Add Entry to Archive File" width="90" x="447" y="136"/>
          <operator activated="true" class="write_file" compatibility="7.1.000-SNAPSHOT" expanded="true" height="68" name="Write File" width="90" x="648" y="136">
            <parameter key="filename" value="C:\Users\jb1376\Desktop\201512.zip"/>
          </operator>
          <connect from_op="Loop Files" from_port="out 1" to_op="Add Entry to Archive File" to_port="file input 1"/>
          <connect from_op="Create Archive File" from_port="archive file" to_op="Add Entry to Archive File" to_port="archive file"/>
          <connect from_op="Add Entry to Archive File" from_port="archive file" to_op="Write File" to_port="file"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
    </process>
    Regards,
    Marco
  • jb1376
    jb1376 New Altair Community Member
    Marco,
    Thanks for the fix. Ughh used to filtering by contains with languages instead of regex. Thanks again for your help.

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.