[SOLVED] Balancing data - pull with undelete possible ?
fras
New Altair Community Member
Hi,
say we have data consisting 1000 times class 0 and 50 times class 1.
Using the Operator "Sample" I can resample class 0 to e.g. 800.
BUT I would like to resample class 1 to e.g. 100 so I have to blow them up some how
what is also called "pull with undelete".
Is this possible ?
Thx, Frank
say we have data consisting 1000 times class 0 and 50 times class 1.
Using the Operator "Sample" I can resample class 0 to e.g. 800.
BUT I would like to resample class 1 to e.g. 100 so I have to blow them up some how
what is also called "pull with undelete".
Is this possible ?
Thx, Frank
Tagged:
0
Answers
-
I would use bootstraping. Take a look at the following code. Let me know if it helps you.
I'm using the dataset Golf that comes with Rapidminer. There are two classes: yes (9 obs) and no (5). I end up with a new dataset which has yes(8 obs) and no(8 obs). That's exactly what you want.<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.006">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
<process expanded="true" height="341" width="815">
<operator activated="true" class="retrieve" compatibility="5.2.006" expanded="true" height="60" name="Retrieve" width="90" x="45" y="165">
<parameter key="repository_entry" value="//Samples/data/Golf"/>
</operator>
<operator activated="true" class="multiply" compatibility="5.2.006" expanded="true" height="94" name="Multiply" width="90" x="256" y="137"/>
<operator activated="true" class="filter_examples" compatibility="5.2.006" expanded="true" height="76" name="Filter Examples" width="90" x="447" y="30">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="Play = yes"/>
</operator>
<operator activated="true" class="sample" compatibility="5.2.006" expanded="true" height="76" name="Sample" width="90" x="581" y="30">
<parameter key="sample_size" value="8"/>
<list key="sample_size_per_class"/>
<list key="sample_ratio_per_class"/>
<list key="sample_probability_per_class"/>
</operator>
<operator activated="true" class="filter_examples" compatibility="5.2.006" expanded="true" height="76" name="Filter Examples (2)" width="90" x="447" y="210">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="Play = no"/>
</operator>
<operator activated="true" class="sample_bootstrapping" compatibility="5.2.006" expanded="true" height="76" name="Sample (Bootstrapping)" width="90" x="593" y="209">
<parameter key="sample" value="absolute"/>
<parameter key="sample_size" value="8"/>
</operator>
<operator activated="true" class="append" compatibility="5.2.006" expanded="true" height="94" name="Append" width="90" x="773" y="109"/>
<connect from_op="Retrieve" from_port="output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Multiply" from_port="output 2" to_op="Filter Examples (2)" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Sample" to_port="example set input"/>
<connect from_op="Sample" from_port="example set output" to_op="Append" to_port="example set 1"/>
<connect from_op="Filter Examples (2)" from_port="example set output" to_op="Sample (Bootstrapping)" to_port="example set input"/>
<connect from_op="Sample (Bootstrapping)" from_port="example set output" to_op="Append" to_port="example set 2"/>
<connect from_op="Append" from_port="merged set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0 -
Yes, that's it. Separating via "Filter Example" and finally "Append" was not on my list...0