"Imbalanced data: label weights or over/undersample"
Bulkington
New Altair Community Member
Hi all,
i have to work with an imbalanced dataset for classification. So I want to try to oversample the minority class or to undersample the majority class. According to this earlier post there is no possibility in RM to generate a fixed label distribution through sampling but the same effect can be simulated by label weights:
http://rapid-i.com/rapidforum/index.php/topic,106.0.html
Now my questions:
1. Where can I find the operator EqualLabelWeighting mentioned in the post? Maybe I'm acting dumb but I just can't find it. btw: I'm using RM 5.1.002
2. Since the above mentioned post is more than two years old: I suppose there is still no possibility to actually oversample or undersample minority/majority classes?
I appreciate your help!
Thanks.
i have to work with an imbalanced dataset for classification. So I want to try to oversample the minority class or to undersample the majority class. According to this earlier post there is no possibility in RM to generate a fixed label distribution through sampling but the same effect can be simulated by label weights:
http://rapid-i.com/rapidforum/index.php/topic,106.0.html
Now my questions:
1. Where can I find the operator EqualLabelWeighting mentioned in the post? Maybe I'm acting dumb but I just can't find it. btw: I'm using RM 5.1.002
2. Since the above mentioned post is more than two years old: I suppose there is still no possibility to actually oversample or undersample minority/majority classes?
I appreciate your help!
Thanks.
0
Answers
-
Hi,
the operator this post referred to the operator that is now called Generate Weight (Straticifaction). But you can do a more fine grained sampling by using a process like this:<?xml version="1.0" encoding="UTF-8" standalone="no"?>
Greetings,
<process version="5.1.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.003" expanded="true" name="Process">
<process expanded="true" height="491" width="788">
<operator activated="true" class="retrieve" compatibility="5.1.003" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
<parameter key="repository_entry" value="//Samples/data/Sonar"/>
</operator>
<operator activated="true" class="loop_values" compatibility="5.1.003" expanded="true" height="76" name="Loop Values" width="90" x="179" y="30">
<parameter key="attribute" value="class"/>
<parameter key="iteration_macro" value="class"/>
<process expanded="true" height="509" width="806">
<operator activated="true" class="filter_examples" compatibility="5.1.003" expanded="true" height="76" name="Filter Examples" width="90" x="45" y="30">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="class=%{class}"/>
</operator>
<operator activated="true" class="sample_bootstrapping" compatibility="5.1.003" expanded="true" height="76" name="Sample (Bootstrapping)" width="90" x="246" y="30">
<parameter key="sample" value="absolute"/>
<parameter key="sample_size" value="200"/>
</operator>
<connect from_port="example set" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Sample (Bootstrapping)" to_port="example set input"/>
<connect from_op="Sample (Bootstrapping)" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="append" compatibility="5.1.003" expanded="true" height="76" name="Append" width="90" x="313" y="30"/>
<connect from_op="Retrieve" from_port="output" to_op="Loop Values" to_port="example set"/>
<connect from_op="Loop Values" from_port="out 1" to_op="Append" to_port="example set 1"/>
<connect from_op="Append" from_port="merged set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Sebastian0