"Filtering based on attribute values"
frankie
New Altair Community Member
Hello,
while this might be a simple question, I have to ask, what is the best way to filter a dataset based on a subset of attribute values?
For example: let's say that I have a dataset with 3 attributes
[tt]
attr1 range: [1,10]
attr2 range: [1,20]
attr3 range: [1,30]
[/tt]
and that I want to filter out those examples that have either
[tt]attr1 > 9 OR attr2 > 18 OR attr3 < 5[/tt].
Can these "outliers" be filtered with one operator? How?
Thanks!
while this might be a simple question, I have to ask, what is the best way to filter a dataset based on a subset of attribute values?
For example: let's say that I have a dataset with 3 attributes
[tt]
attr1 range: [1,10]
attr2 range: [1,20]
attr3 range: [1,30]
[/tt]
and that I want to filter out those examples that have either
[tt]attr1 > 9 OR attr2 > 18 OR attr3 < 5[/tt].
Can these "outliers" be filtered with one operator? How?
Thanks!
0
Answers
-
Er,... you can use 'Filter Examples', here is some stuff from the help !!!!!
and here is an example....Please note your can define a logical OR of several conditions with || and a logical AND of two conditions with two ampersand (condition1 && condition2) - or simply by applying several ExampleFilter operators in a row. Please note also that for nominal attributes you can define a regular expression for value of the possible equal and not equal checks.
To filter all examples (i.e. rows) where an attribute "att" has a missing value use the expression "att = ?" resp. "att!= ?". Note that for nominal values the question mark must be escaped ("\?") because, as noted above, a regular expression is expected in this case.
For "unknown_attributes" the parameter string must be empty. This filter removes all examples containing attributes that have missing or illegal values. For "unknown_label" the parameter string must also be empty. This filter removes all examples with an unknown label value.<?xml version="1.0" encoding="UTF-8" standalone="no"?>
>
<process version="5.1.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.003" expanded="true" name="Process">
<process expanded="true" height="370" width="346">
<operator activated="true" class="generate_data" compatibility="5.1.003" expanded="true" height="60" name="Generate Data" width="90" x="45" y="165"/>
<operator activated="true" class="filter_examples" compatibility="5.1.003" expanded="true" height="76" name="Filter Examples" width="90" x="246" y="210">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="att1<0 || att2 > 0"/>
<parameter key="invert_filter" value="true"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process
0 -
Thanks and sorry for not reading the entire help-file. Just though the "Filter Examples" operator looked too simple with only one input field.. hence I disregarded it...0
-
Easily done, and it is not the world's raciest read ;D
Good weekend!0 -
Hi,
well if anybody finds a better description of what it does: Just edit it on the wiki...I'm very open to all literary valuable phrases that still help to understand what an operator does
Greetings,
Sebastian0