🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"[SOLVED] Filtering examples containing multiple attributes with a blacklist"

User: "scepxko"
New Altair Community Member
Updated by Jocelyn
Hi everyone,

I'm stuck on something since several hours and can't definitely find the solution by myself.
I've searched for a solution on the forum without any success so far.

Here's the situation:

- The source file is an Excel file "input.xls" containing 4 columns/attributes:
ID ---- industry --- location ------- title
111 xyz Cayman Islands xyz
222 xyz Peru xyz
333 xyz USA xyz

- I want to remove the lines whose attribute "location" has a value contained in a blacklist "location-to-filter-out.xls".
This Excel file looks like
location2
Cayman
France
USA


- Then save the filtered file as "output.xls". Here it would look like:
ID ---- industry --- location ------- title
222 xyz Peru xyz

What I have done so far:

- In the code below, I have a "Loop Attributes" on "location"

- Inside it (subprocess), I have a "Read Excel" operator, extracting the "location2" values from the blacklist. And a second "Loop attributes" running on "location2" this time.

- Inside this second "Loop attributes" (sub-subprocess), I have the filtering condition with the operator "Filter Examples", with the condition class "attribute_value_filter".

The parameter string is: %{loop_attribute}=(.*?)%{loop_attribute2}(.*?)

Where the %{loop_attribute} is the original attribute "location" in the source file. For example "Cayman Islands".

And %{loop_attribute2} is a location blacklisted. For example "Cayman" (the blacklist contains only single words for the sake of simplicity. That's why I use "(.*?)" in the filtering condition.


Problem:
It does not work and I'm unable to find the flaw in my logic.

Maybe can somebody enlighten me?
Thanks in advance for your help.
Alex

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
   <process expanded="true">
     <operator activated="true" class="read_excel" compatibility="5.3.015" expanded="true" height="60" name="Read Excel (3)" width="90" x="45" y="30">
       <parameter key="excel_file" value="E:\Rapidminer\example\input.xls"/>
       <parameter key="imported_cell_range" value="A1:G650"/>
       <parameter key="first_row_as_names" value="false"/>
       <list key="annotations">
         <parameter key="0" value="Name"/>
       </list>
       <list key="data_set_meta_data_information">
         <parameter key="0" value="ID.true.integer.attribute"/>
         <parameter key="1" value="industry.true.polynominal.attribute"/>
         <parameter key="2" value="location.true.polynominal.attribute"/>
         <parameter key="3" value="title.true.polynominal.attribute"/>
       </list>
     </operator>
     <operator activated="true" class="loop_attributes" compatibility="5.3.015" expanded="true" height="94" name="Loop Attributes" width="90" x="514" y="30">
       <parameter key="attribute_filter_type" value="single"/>
       <parameter key="attribute" value="location"/>
       <process expanded="true">
         <operator activated="true" class="read_excel" compatibility="5.3.015" expanded="true" height="60" name="Read Excel (4)" width="90" x="246" y="165">
           <parameter key="excel_file" value="E:\Rapidminer\example\location-to-filter-out.xls"/>
           <parameter key="imported_cell_range" value="A1:E67"/>
           <parameter key="first_row_as_names" value="false"/>
           <list key="annotations">
             <parameter key="0" value="Name"/>
           </list>
           <list key="data_set_meta_data_information">
             <parameter key="0" value="location2.true.polynominal.attribute"/>
           </list>
         </operator>
         <operator activated="true" class="loop_attributes" compatibility="5.3.015" expanded="true" height="76" name="Loop Attributes (2)" width="90" x="514" y="165">
           <parameter key="attribute_filter_type" value="single"/>
           <parameter key="attribute" value="location2"/>
           <parameter key="iteration_macro" value="loop_attribute2"/>
           <process expanded="true">
             <operator activated="true" class="filter_examples" compatibility="5.3.015" expanded="true" height="76" name="Filter Examples (3)" width="90" x="447" y="30">
               <parameter key="condition_class" value="attribute_value_filter"/>
               <parameter key="parameter_string" value="%{loop_attribute}=(.*?)%{loop_attribute2}(.*?)"/>
               <parameter key="invert_filter" value="true"/>
             </operator>
             <connect from_port="example set" to_op="Filter Examples (3)" to_port="example set input"/>
             <connect from_op="Filter Examples (3)" from_port="example set output" to_port="example set"/>
             <portSpacing port="source_example set" spacing="0"/>
             <portSpacing port="sink_example set" spacing="0"/>
             <portSpacing port="sink_result 1" spacing="0"/>
           </process>
         </operator>
         <connect from_port="example set" to_port="example set"/>
         <connect from_op="Read Excel (4)" from_port="output" to_op="Loop Attributes (2)" to_port="example set"/>
         <connect from_op="Loop Attributes (2)" from_port="example set" to_port="result 1"/>
         <portSpacing port="source_example set" spacing="0"/>
         <portSpacing port="sink_example set" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
     <operator activated="true" class="write_excel" compatibility="5.3.015" expanded="true" height="76" name="Write Excel" width="90" x="849" y="30">
       <parameter key="excel_file" value="E:\Rapidminer\example\output.xls"/>
     </operator>
     <connect from_op="Read Excel (3)" from_port="output" to_op="Loop Attributes" to_port="example set"/>
     <connect from_op="Loop Attributes" from_port="example set" to_op="Write Excel" to_port="input"/>
     <connect from_op="Write Excel" from_port="through" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>


Find more posts tagged with