"[SOLVED] Filter attributes against whitelist"

mataio
mataio New Altair Community Member
edited November 2024 in Community Q&A
Hello everybody,

I have an interesting problem which I could not solve on my own and hope someone can provide some help.

I have a table of data with several attributes and a whitelist of attribute names. Is there any possibility in RapidMiner to filter the attributes based on that list?

Thanks for your help in advance

Welcome!

It looks like you're new here. Sign in or register to get started.

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Hello matatio,

    you can do this using a whitelist in your repo/csv/excel/..

    You basicly read it and use a Loop values on the whitelist. I've created an example process on random data. I created an CSV file with two entries.

    one
    two
    Keep care of the excecution order. The remember operators need to be excecuted before their associated recall operators.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.1.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.1.000" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="generate_data" compatibility="6.1.000" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="target_function" value="multi classification"/>
          </operator>
          <operator activated="true" class="remember" compatibility="6.1.000" expanded="true" height="60" name="Remember" width="90" x="179" y="30">
            <parameter key="name" value="DataSet"/>
          </operator>
          <operator activated="true" class="subprocess" compatibility="6.1.000" expanded="true" height="76" name="Create Empty" width="90" x="313" y="30">
            <process expanded="true">
              <operator activated="true" class="filter_examples" compatibility="6.1.000" expanded="true" height="94" name="Filter Examples (2)" width="90" x="45" y="30">
                <parameter key="condition_class" value="all"/>
                <parameter key="invert_filter" value="true"/>
                <list key="filters_list"/>
              </operator>
              <operator activated="true" class="remember" compatibility="6.1.000" expanded="true" height="60" name="Remember (3)" width="90" x="179" y="30">
                <parameter key="name" value="ResultingSample"/>
              </operator>
              <connect from_port="in 1" to_op="Filter Examples (2)" to_port="example set input"/>
              <connect from_op="Filter Examples (2)" from_port="example set output" to_op="Remember (3)" to_port="store"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="read_csv" compatibility="6.1.000" expanded="true" height="60" name="Read CSV" width="90" x="447" y="120">
            <parameter key="csv_file" value="C:\Users\Martin\Rapidforum\List"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations"/>
            <parameter key="encoding" value="windows-1252"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="att1.true.polynominal.attribute"/>
            </list>
          </operator>
          <operator activated="true" class="loop_values" compatibility="6.1.000" expanded="true" height="76" name="Loop Values" width="90" x="581" y="120">
            <parameter key="attribute" value="att1"/>
            <process expanded="true">
              <operator activated="true" class="recall" compatibility="6.1.000" expanded="true" height="60" name="Recall" width="90" x="313" y="120">
                <parameter key="name" value="DataSet"/>
                <parameter key="remove_from_store" value="false"/>
              </operator>
              <operator activated="true" class="filter_examples" compatibility="6.1.000" expanded="true" height="94" name="Filter Examples" width="90" x="447" y="120">
                <parameter key="parameter_string" value="label=%{loop_value}"/>
                <parameter key="condition_class" value="attribute_value_filter"/>
                <list key="filters_list"/>
              </operator>
              <operator activated="true" class="remember" compatibility="6.1.000" expanded="true" height="60" name="Remember (2)" width="90" x="581" y="120">
                <parameter key="name" value="ResultingSample"/>
              </operator>
              <connect from_port="example set" to_port="out 1"/>
              <connect from_op="Recall" from_port="result" to_op="Filter Examples" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="example set output" to_op="Remember (2)" to_port="store"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="recall" compatibility="6.1.000" expanded="true" height="60" name="Recall (2)" width="90" x="715" y="120">
            <parameter key="name" value="DataSet"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Remember" to_port="store"/>
          <connect from_op="Remember" from_port="stored" to_op="Create Empty" to_port="in 1"/>
          <connect from_op="Create Empty" from_port="out 1" to_port="result 1"/>
          <connect from_op="Read CSV" from_port="output" to_op="Loop Values" to_port="example set"/>
          <connect from_op="Recall (2)" from_port="result" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>

  • mataio
    mataio New Altair Community Member
    Thank you for your reply but I'm looking for something else, my whitelist contains the names of the attributes I want to keep, the rest should be removed. I don't have a specific attribute of type name.

    Basically, is it possible to use the operator Select Attributes instead of Filter Examples in the loop with the following parameters?
    - filter type: regular expression (?)
    - regular expression: something like attribute_name=%{loop_value}
  • MartinLiebig
    MartinLiebig
    Altair Employee
    Hi,

    yes. this is basicly one way to go. If you have a  pattern what to filter. E.g. everything which starts with "att" you can use a simple regex for filtering. There are several tutorials around
    Otherwise you can simply use "single" in Generate Attribute and invert the selection. Attached is a process which should help you

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.1.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.1.000" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="generate_data" compatibility="6.1.000" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="target_function" value="multi classification"/>
          </operator>
          <operator activated="true" class="remember" compatibility="6.1.000" expanded="true" height="60" name="Remember" width="90" x="179" y="30">
            <parameter key="name" value="DataSet"/>
          </operator>
          <operator activated="true" class="read_csv" compatibility="6.1.000" expanded="true" height="60" name="Read CSV" width="90" x="447" y="120">
            <parameter key="csv_file" value="C:\Users\Martin\Rapidforum\List"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations"/>
            <parameter key="encoding" value="windows-1252"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="att1.true.polynominal.attribute"/>
            </list>
          </operator>
          <operator activated="true" class="loop_values" compatibility="6.1.000" expanded="true" height="76" name="Loop Values" width="90" x="581" y="120">
            <parameter key="attribute" value="att1"/>
            <process expanded="true">
              <operator activated="true" class="recall" compatibility="6.1.000" expanded="true" height="60" name="Recall" width="90" x="313" y="120">
                <parameter key="name" value="DataSet"/>
                <parameter key="remove_from_store" value="false"/>
              </operator>
              <operator activated="true" class="select_attributes" compatibility="6.1.000" expanded="true" height="76" name="Select Attributes" width="90" x="447" y="120">
                <parameter key="attribute_filter_type" value="single"/>
                <parameter key="attribute" value="%{loop_value}"/>
                <parameter key="invert_selection" value="true"/>
              </operator>
              <operator activated="false" class="filter_examples" compatibility="6.1.000" expanded="true" height="94" name="Filter Examples" width="90" x="514" y="390">
                <parameter key="parameter_string" value="label=%{loop_value}"/>
                <parameter key="condition_class" value="attribute_value_filter"/>
                <list key="filters_list"/>
              </operator>
              <operator activated="true" class="remember" compatibility="6.1.000" expanded="true" height="60" name="Remember (2)" width="90" x="581" y="120">
                <parameter key="name" value="DataSet"/>
              </operator>
              <connect from_port="example set" to_port="out 1"/>
              <connect from_op="Recall" from_port="result" to_op="Select Attributes" to_port="example set input"/>
              <connect from_op="Select Attributes" from_port="example set output" to_op="Remember (2)" to_port="store"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="recall" compatibility="6.1.000" expanded="true" height="60" name="Recall (2)" width="90" x="715" y="120">
            <parameter key="name" value="DataSet"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Remember" to_port="store"/>
          <connect from_op="Remember" from_port="stored" to_port="result 1"/>
          <connect from_op="Read CSV" from_port="output" to_op="Loop Values" to_port="example set"/>
          <connect from_op="Recall (2)" from_port="result" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>

  • mataio
    mataio New Altair Community Member
    Thank you so much, worked perfectly :)

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.