newbie: Group by operator
Qingqiu
New Altair Community Member
hi,
I have a example set with an attribute labeling the examples into different bins (eg: 1,2,3,.., 10) and now I want to divide my dataset into 10 subsets according to the bin index. I try to use the Groupby operator but the result example set is the same as the original. I also tried to use the splittedexmapleset function but still got the same result. Anything suggestions? Thank you for any help!
Best Regards
I have a example set with an attribute labeling the examples into different bins (eg: 1,2,3,.., 10) and now I want to divide my dataset into 10 subsets according to the bin index. I try to use the Groupby operator but the result example set is the same as the original. I also tried to use the splittedexmapleset function but still got the same result. Anything suggestions? Thank you for any help!
Best Regards
Tagged:
0
Answers
-
Hi Qingqiu,
maybe this is not the best way to solve your problem, but it's a simple one. You could use a "Multiply" operator combined with "Filter Examples" operators to get specific subsets. Here a small example:<?xml version="1.0" encoding="UTF-8" standalone="no"?>
This way all the subsets have to be set in in the process. If you have a larger number of subsets you could perhaps create the groups automatically inside a loop.
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.0.8" expanded="true" name="Process">
<process expanded="true" height="386" width="480">
<operator activated="true" class="generate_data" compatibility="5.0.8" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
<parameter key="number_examples" value="20"/>
<parameter key="number_of_attributes" value="2"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.0.8" expanded="true" height="76" name="Generate Attributes" width="90" x="179" y="30">
<list key="function_descriptions">
<parameter key="greatest_att" value="if(att2 > att1, 2, 1)"/>
</list>
</operator>
<operator activated="true" class="multiply" compatibility="5.0.8" expanded="true" height="94" name="Multiply" width="90" x="179" y="210"/>
<operator activated="true" class="filter_examples" compatibility="5.0.8" expanded="true" height="76" name="Filter Examples (2)" width="90" x="313" y="300">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="greatest_att = 2"/>
</operator>
<operator activated="true" class="filter_examples" compatibility="5.0.8" expanded="true" height="76" name="Filter Examples" width="90" x="313" y="210">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="greatest_att = 1"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Multiply" from_port="output 2" to_op="Filter Examples (2)" to_port="example set input"/>
<connect from_op="Filter Examples (2)" from_port="example set output" to_port="result 2"/>
<connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="180"/>
<portSpacing port="sink_result 2" spacing="72"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Regards,
Matthias0 -
Hi Matthias,
Thank you so much for your help!:) It works and it is really simple. I focused too much on the Groupby operator and even do not know there is a loop value operator...Thanks again!
Best Regards
Qingqiu0