"[SOLVED] Filter attributes against whitelist"
Hello everybody,
I have an interesting problem which I could not solve on my own and hope someone can provide some help.
I have a table of data with several attributes and a whitelist of attribute names. Is there any possibility in RapidMiner to filter the attributes based on that list?
Thanks for your help in advance
I have an interesting problem which I could not solve on my own and hope someone can provide some help.
I have a table of data with several attributes and a whitelist of attribute names. Is there any possibility in RapidMiner to filter the attributes based on that list?
Thanks for your help in advance
Find more posts tagged with
Sort by:
1 - 4 of
41
Thank you for your reply but I'm looking for something else, my whitelist contains the names of the attributes I want to keep, the rest should be removed. I don't have a specific attribute of type name.
Basically, is it possible to use the operator Select Attributes instead of Filter Examples in the loop with the following parameters?
- filter type: regular expression (?)
- regular expression: something like attribute_name=%{loop_value}
Basically, is it possible to use the operator Select Attributes instead of Filter Examples in the loop with the following parameters?
- filter type: regular expression (?)
- regular expression: something like attribute_name=%{loop_value}
Hi,
yes. this is basicly one way to go. If you have a pattern what to filter. E.g. everything which starts with "att" you can use a simple regex for filtering. There are several tutorials around
Otherwise you can simply use "single" in Generate Attribute and invert the selection. Attached is a process which should help you
yes. this is basicly one way to go. If you have a pattern what to filter. E.g. everything which starts with "att" you can use a simple regex for filtering. There are several tutorials around
Otherwise you can simply use "single" in Generate Attribute and invert the selection. Attached is a process which should help you
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.1.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.1.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_data" compatibility="6.1.000" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
<parameter key="target_function" value="multi classification"/>
</operator>
<operator activated="true" class="remember" compatibility="6.1.000" expanded="true" height="60" name="Remember" width="90" x="179" y="30">
<parameter key="name" value="DataSet"/>
</operator>
<operator activated="true" class="read_csv" compatibility="6.1.000" expanded="true" height="60" name="Read CSV" width="90" x="447" y="120">
<parameter key="csv_file" value="C:\Users\Martin\Rapidforum\List"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations"/>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="att1.true.polynominal.attribute"/>
</list>
</operator>
<operator activated="true" class="loop_values" compatibility="6.1.000" expanded="true" height="76" name="Loop Values" width="90" x="581" y="120">
<parameter key="attribute" value="att1"/>
<process expanded="true">
<operator activated="true" class="recall" compatibility="6.1.000" expanded="true" height="60" name="Recall" width="90" x="313" y="120">
<parameter key="name" value="DataSet"/>
<parameter key="remove_from_store" value="false"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="6.1.000" expanded="true" height="76" name="Select Attributes" width="90" x="447" y="120">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="%{loop_value}"/>
<parameter key="invert_selection" value="true"/>
</operator>
<operator activated="false" class="filter_examples" compatibility="6.1.000" expanded="true" height="94" name="Filter Examples" width="90" x="514" y="390">
<parameter key="parameter_string" value="label=%{loop_value}"/>
<parameter key="condition_class" value="attribute_value_filter"/>
<list key="filters_list"/>
</operator>
<operator activated="true" class="remember" compatibility="6.1.000" expanded="true" height="60" name="Remember (2)" width="90" x="581" y="120">
<parameter key="name" value="DataSet"/>
</operator>
<connect from_port="example set" to_port="out 1"/>
<connect from_op="Recall" from_port="result" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Remember (2)" to_port="store"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="recall" compatibility="6.1.000" expanded="true" height="60" name="Recall (2)" width="90" x="715" y="120">
<parameter key="name" value="DataSet"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Remember" to_port="store"/>
<connect from_op="Remember" from_port="stored" to_port="result 1"/>
<connect from_op="Read CSV" from_port="output" to_op="Loop Values" to_port="example set"/>
<connect from_op="Recall (2)" from_port="result" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
you can do this using a whitelist in your repo/csv/excel/..
You basicly read it and use a Loop values on the whitelist. I've created an example process on random data. I created an CSV file with two entries. Keep care of the excecution order. The remember operators need to be excecuted before their associated recall operators.