"Filter Top K samples"
mataio
New Altair Community Member
Hello everybody,
I have a question regarding the filtering of samples. I would like to filter my samples like the Top 10% of attribute X. I know it is possible to use the "Filter Examples" operator but as far as I know it can only use a static value as filter like X>=1.
Does anybody know a way to tackle my problem?
Thanks in advance
I have a question regarding the filtering of samples. I would like to filter my samples like the Top 10% of attribute X. I know it is possible to use the "Filter Examples" operator but as far as I know it can only use a static value as filter like X>=1.
Does anybody know a way to tackle my problem?
Thanks in advance
0
Answers
-
Hi there,
you can use a combination of sort, generate ID and a Filter examples to extract the top k in attribute X. If you want to have the top k % you simply need to provide the sample Size or extract it using aggregate and extract macro
Attached is a example process to select the top 3 values of att1 in the iris dataset<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.1.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="6.1.000" expanded="true" height="60" name="Retrieve Iris" width="90" x="45" y="120">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="sort" compatibility="6.1.000" expanded="true" height="76" name="Sort" width="90" x="246" y="120">
<parameter key="attribute_name" value="a1"/>
</operator>
<operator activated="true" class="generate_id" compatibility="6.1.000" expanded="true" height="76" name="Generate ID" width="90" x="380" y="120"/>
<operator activated="true" class="filter_examples" compatibility="6.1.000" expanded="true" height="94" name="Filter Examples" width="90" x="581" y="120">
<list key="filters_list">
<parameter key="filters_entry_key" value="id.lt.4"/>
</list>
</operator>
<connect from_op="Retrieve Iris" from_port="output" to_op="Sort" to_port="example set input"/>
<connect from_op="Sort" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
<connect from_op="Generate ID" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0