Filtering rows and columns
Find more posts tagged with
Sort by:
1 - 11 of
111
Hi Marius,
At first I want to say THANK YOU at this time for your fast and competent help with all my issues the last days!
I'm really trying to find answers in this forum and the documentation/wiki first.
I checked the "filter examples" operator you mentioned. Especially "attribute_value_filter" shows manifold capabilities to remove rows.
Though, I still found no solution to remove all rows but the last one with this operator.
Hi there,
You can count the examples into a macro and used that as your range start and end, like this.
You can count the examples into a macro and used that as your range start and end, like this.
HTH
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
<process expanded="true" height="349" width="681">
<operator activated="true" class="generate_data" compatibility="5.2.003" expanded="true" height="60" name="Generate Data" width="90" x="9" y="6"/>
<operator activated="true" class="extract_macro" compatibility="5.2.003" expanded="true" height="60" name="Extract Macro" width="90" x="179" y="30">
<parameter key="macro" value="exs"/>
</operator>
<operator activated="true" class="filter_example_range" compatibility="5.2.003" expanded="true" height="76" name="Filter Example Range" width="90" x="331" y="23">
<parameter key="first_example" value="%{exs}"/>
<parameter key="last_example" value="%{exs}"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Extract Macro" to_port="example set"/>
<connect from_op="Extract Macro" from_port="example set" to_op="Filter Example Range" to_port="example set input"/>
<connect from_op="Filter Example Range" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Hi haddock,
Thanks for sharing. Indeed this solution works quite effective.
To be a somewhat more flexible in my process I tried another setup which can provide the last x rows of an example set by doing calculation with the macro's value. Unfortunatelly this doesn't work in the "filter example" operator (e.g. "%{exs}-5" as "first example" value).
Oh dear... I thougt it was so simple - now it drives me nuts

Sincerely
Sachs
Sorry, but I still don't get it. Tried the following process but without success... and still not even a clue how to make it 
This process delivers x times the (same) last row and provides a collection instead of a single example set...
Bye for now
Sachs

This process delivers x times the (same) last row and provides a collection instead of a single example set...
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
<process expanded="true" height="450" width="647">
<operator activated="true" class="generate_data" compatibility="5.2.003" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
<operator activated="true" class="loop" compatibility="5.2.003" expanded="true" height="76" name="Loop" width="90" x="179" y="30">
<parameter key="set_iteration_macro" value="true"/>
<parameter key="iterations" value="5"/>
<process expanded="true" height="450" width="647">
<operator activated="true" class="extract_macro" compatibility="5.2.003" expanded="true" height="60" name="Extract Macro" width="90" x="45" y="30">
<parameter key="macro" value="%{abc}"/>
</operator>
<operator activated="true" class="filter_example_range" compatibility="5.2.003" expanded="true" height="76" name="Filter Example Range" width="90" x="179" y="30">
<parameter key="first_example" value="%{abc}"/>
<parameter key="last_example" value="%{abc}"/>
</operator>
<connect from_port="input 1" to_op="Extract Macro" to_port="example set"/>
<connect from_op="Extract Macro" from_port="example set" to_op="Filter Example Range" to_port="example set input"/>
<connect from_op="Filter Example Range" from_port="example set output" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Loop" to_port="input 1"/>
<connect from_op="Loop" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Bye for now
Sachs
Hi,
I should have been more explicit about the count. The last example we know; if you take five from the number of examples
you'll filter to six examples ( think about the loop count ), so you need to add one back, like this.
I should have been more explicit about the count. The last example we know; if you take five from the number of examples
you'll filter to six examples ( think about the loop count ), so you need to add one back, like this.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>The more you use RM, the more those pesky macros show up!
<process version="5.2.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
<process expanded="true" height="450" width="748">
<operator activated="true" class="generate_data" compatibility="5.2.003" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
<operator activated="true" class="extract_macro" compatibility="5.2.003" expanded="true" height="60" name="Extract Macro" width="90" x="179" y="30">
<parameter key="macro" value="%{last}"/>
</operator>
<operator activated="true" class="generate_macro" compatibility="5.2.003" expanded="true" height="76" name="Generate Macro" width="90" x="380" y="30">
<list key="function_descriptions">
<parameter key="first" value="%{last}-5+1"/>
</list>
</operator>
<operator activated="true" class="filter_example_range" compatibility="5.2.003" expanded="true" height="76" name="Filter Example Range" width="90" x="648" y="30">
<parameter key="first_example" value="%{first}"/>
<parameter key="last_example" value="%{last}"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Extract Macro" to_port="example set"/>
<connect from_op="Extract Macro" from_port="example set" to_op="Generate Macro" to_port="through 1"/>
<connect from_op="Generate Macro" from_port="through 1" to_op="Filter Example Range" to_port="example set input"/>
<connect from_op="Filter Example Range" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Indeedy!
The biggest mistake that I see being made on this Forum is that people underestimate RapidMiner, it is a deep and mature environment with many man years behind it. And these are seriously qualified enthusiastic man years, so the chances are that most likely tasks are covered, as you say if only you know how. It's a big toolkit, and it takes time.
Stick with it!
The biggest mistake that I see being made on this Forum is that people underestimate RapidMiner, it is a deep and mature environment with many man years behind it. And these are seriously qualified enthusiastic man years, so the chances are that most likely tasks are covered, as you say if only you know how. It's a big toolkit, and it takes time.
Stick with it!
Hi marasma,
I would not necessarily call this a similar problem, but try the process below. It calculates the sum of each column using Aggregate with a default aggregation, then cleans the filenames to match the original name, then selects attribute which some up to at least 0 in this example, and finally the filtered data is used as reference data to filter the original data in Reorder Attributes.
I propose to use breakpoints after each operator to understand what's going on.
Best regards,
Marius
Here is a link to this process in the Community Repository.
I would not necessarily call this a similar problem, but try the process below. It calculates the sum of each column using Aggregate with a default aggregation, then cleans the filenames to match the original name, then selects attribute which some up to at least 0 in this example, and finally the filtered data is used as reference data to filter the original data in Reorder Attributes.
I propose to use breakpoints after each operator to understand what's going on.
Best regards,
Marius
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.0.008">
<context>
<input/>
<output/>
<macros>
<macro>
<key>threshold</key>
<value>0</value>
</macro>
</macros>
</context>
<operator activated="true" class="process" compatibility="6.0.008" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_data" compatibility="6.0.008" expanded="true" height="60" name="Generate Data" width="90" x="112" y="30"/>
<operator activated="true" class="aggregate" compatibility="6.0.008" expanded="true" height="76" name="Aggregate" width="90" x="246" y="120">
<parameter key="use_default_aggregation" value="true"/>
<parameter key="default_aggregation_function" value="sum"/>
<list key="aggregation_attributes"/>
</operator>
<operator activated="true" class="rename_by_replacing" compatibility="6.0.008" expanded="true" height="76" name="Rename by Replacing" width="90" x="380" y="30">
<parameter key="replace_what" value=".*\((.*)\)"/>
<parameter key="replace_by" value="$1"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="6.0.008" expanded="true" height="76" name="Select Attributes" width="90" x="514" y="30">
<parameter key="attribute_filter_type" value="numeric_value_filter"/>
<parameter key="numeric_condition" value="> %{threshold}"/>
</operator>
<operator activated="true" class="order_attributes" compatibility="6.0.008" expanded="true" height="76" name="Reorder Attributes" width="90" x="648" y="165">
<parameter key="sort_mode" value="reference data"/>
<parameter key="handle_unmatched" value="remove"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
<connect from_op="Aggregate" from_port="original" to_op="Reorder Attributes" to_port="example set input"/>
<connect from_op="Rename by Replacing" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Reorder Attributes" to_port="reference_data"/>
<connect from_op="Reorder Attributes" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Here is a link to this process in the Community Repository.
Sort by:
1 - 1 of
11
There is Filter Examples for examples (rows), and Select Attributes for attributes (columns).



Best, Marius



Best, Marius
Best, Marius