"[SOLVED] Filter attributes against whitelist"

mataio
New Altair Community Member
Hello everybody,
I have an interesting problem which I could not solve on my own and hope someone can provide some help.
I have a table of data with several attributes and a whitelist of attribute names. Is there any possibility in RapidMiner to filter the attributes based on that list?
Thanks for your help in advance
I have an interesting problem which I could not solve on my own and hope someone can provide some help.
I have a table of data with several attributes and a whitelist of attribute names. Is there any possibility in RapidMiner to filter the attributes based on that list?
Thanks for your help in advance
0
Answers
-
Hello matatio,
you can do this using a whitelist in your repo/csv/excel/..
You basicly read it and use a Loop values on the whitelist. I've created an example process on random data. I created an CSV file with two entries.
Keep care of the excecution order. The remember operators need to be excecuted before their associated recall operators.
one
two
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.1.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.1.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_data" compatibility="6.1.000" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
<parameter key="target_function" value="multi classification"/>
</operator>
<operator activated="true" class="remember" compatibility="6.1.000" expanded="true" height="60" name="Remember" width="90" x="179" y="30">
<parameter key="name" value="DataSet"/>
</operator>
<operator activated="true" class="subprocess" compatibility="6.1.000" expanded="true" height="76" name="Create Empty" width="90" x="313" y="30">
<process expanded="true">
<operator activated="true" class="filter_examples" compatibility="6.1.000" expanded="true" height="94" name="Filter Examples (2)" width="90" x="45" y="30">
<parameter key="condition_class" value="all"/>
<parameter key="invert_filter" value="true"/>
<list key="filters_list"/>
</operator>
<operator activated="true" class="remember" compatibility="6.1.000" expanded="true" height="60" name="Remember (3)" width="90" x="179" y="30">
<parameter key="name" value="ResultingSample"/>
</operator>
<connect from_port="in 1" to_op="Filter Examples (2)" to_port="example set input"/>
<connect from_op="Filter Examples (2)" from_port="example set output" to_op="Remember (3)" to_port="store"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="read_csv" compatibility="6.1.000" expanded="true" height="60" name="Read CSV" width="90" x="447" y="120">
<parameter key="csv_file" value="C:\Users\Martin\Rapidforum\List"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations"/>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="att1.true.polynominal.attribute"/>
</list>
</operator>
<operator activated="true" class="loop_values" compatibility="6.1.000" expanded="true" height="76" name="Loop Values" width="90" x="581" y="120">
<parameter key="attribute" value="att1"/>
<process expanded="true">
<operator activated="true" class="recall" compatibility="6.1.000" expanded="true" height="60" name="Recall" width="90" x="313" y="120">
<parameter key="name" value="DataSet"/>
<parameter key="remove_from_store" value="false"/>
</operator>
<operator activated="true" class="filter_examples" compatibility="6.1.000" expanded="true" height="94" name="Filter Examples" width="90" x="447" y="120">
<parameter key="parameter_string" value="label=%{loop_value}"/>
<parameter key="condition_class" value="attribute_value_filter"/>
<list key="filters_list"/>
</operator>
<operator activated="true" class="remember" compatibility="6.1.000" expanded="true" height="60" name="Remember (2)" width="90" x="581" y="120">
<parameter key="name" value="ResultingSample"/>
</operator>
<connect from_port="example set" to_port="out 1"/>
<connect from_op="Recall" from_port="result" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Remember (2)" to_port="store"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="recall" compatibility="6.1.000" expanded="true" height="60" name="Recall (2)" width="90" x="715" y="120">
<parameter key="name" value="DataSet"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Remember" to_port="store"/>
<connect from_op="Remember" from_port="stored" to_op="Create Empty" to_port="in 1"/>
<connect from_op="Create Empty" from_port="out 1" to_port="result 1"/>
<connect from_op="Read CSV" from_port="output" to_op="Loop Values" to_port="example set"/>
<connect from_op="Recall (2)" from_port="result" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>0 -
Thank you for your reply but I'm looking for something else, my whitelist contains the names of the attributes I want to keep, the rest should be removed. I don't have a specific attribute of type name.
Basically, is it possible to use the operator Select Attributes instead of Filter Examples in the loop with the following parameters?
- filter type: regular expression (?)
- regular expression: something like attribute_name=%{loop_value}0 -
Hi,
yes. this is basicly one way to go. If you have a pattern what to filter. E.g. everything which starts with "att" you can use a simple regex for filtering. There are several tutorials around
Otherwise you can simply use "single" in Generate Attribute and invert the selection. Attached is a process which should help you
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.1.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.1.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_data" compatibility="6.1.000" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
<parameter key="target_function" value="multi classification"/>
</operator>
<operator activated="true" class="remember" compatibility="6.1.000" expanded="true" height="60" name="Remember" width="90" x="179" y="30">
<parameter key="name" value="DataSet"/>
</operator>
<operator activated="true" class="read_csv" compatibility="6.1.000" expanded="true" height="60" name="Read CSV" width="90" x="447" y="120">
<parameter key="csv_file" value="C:\Users\Martin\Rapidforum\List"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations"/>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="att1.true.polynominal.attribute"/>
</list>
</operator>
<operator activated="true" class="loop_values" compatibility="6.1.000" expanded="true" height="76" name="Loop Values" width="90" x="581" y="120">
<parameter key="attribute" value="att1"/>
<process expanded="true">
<operator activated="true" class="recall" compatibility="6.1.000" expanded="true" height="60" name="Recall" width="90" x="313" y="120">
<parameter key="name" value="DataSet"/>
<parameter key="remove_from_store" value="false"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="6.1.000" expanded="true" height="76" name="Select Attributes" width="90" x="447" y="120">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="%{loop_value}"/>
<parameter key="invert_selection" value="true"/>
</operator>
<operator activated="false" class="filter_examples" compatibility="6.1.000" expanded="true" height="94" name="Filter Examples" width="90" x="514" y="390">
<parameter key="parameter_string" value="label=%{loop_value}"/>
<parameter key="condition_class" value="attribute_value_filter"/>
<list key="filters_list"/>
</operator>
<operator activated="true" class="remember" compatibility="6.1.000" expanded="true" height="60" name="Remember (2)" width="90" x="581" y="120">
<parameter key="name" value="DataSet"/>
</operator>
<connect from_port="example set" to_port="out 1"/>
<connect from_op="Recall" from_port="result" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Remember (2)" to_port="store"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="recall" compatibility="6.1.000" expanded="true" height="60" name="Recall (2)" width="90" x="715" y="120">
<parameter key="name" value="DataSet"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Remember" to_port="store"/>
<connect from_op="Remember" from_port="stored" to_port="result 1"/>
<connect from_op="Read CSV" from_port="output" to_op="Loop Values" to_port="example set"/>
<connect from_op="Recall (2)" from_port="result" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>0 -
Thank you so much, worked perfectly0