"Imbalanced data: label weights or over/undersample"

Bulkington
Bulkington New Altair Community Member
edited November 5 in Community Q&A
Hi all,

i have to work with an imbalanced dataset for classification. So I want to try to oversample the minority class or to undersample the majority class. According to this earlier post there is no possibility in RM to generate a fixed label distribution through sampling but the same effect can be simulated by label weights:

http://rapid-i.com/rapidforum/index.php/topic,106.0.html

Now my questions:

1. Where can I find the operator EqualLabelWeighting mentioned in the post? Maybe I'm acting dumb but I just can't find it. btw: I'm using RM 5.1.002
2. Since the above mentioned post is more than two years old: I suppose there is still no possibility to actually oversample or undersample minority/majority classes?

I appreciate your help!

Thanks.
Tagged:

Answers

  • land
    land New Altair Community Member
    Hi,
    the operator this post referred to the operator that is now called Generate Weight (Straticifaction). But you can do a more fine grained sampling by using a process like this:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.003" expanded="true" name="Process">
        <process expanded="true" height="491" width="788">
          <operator activated="true" class="retrieve" compatibility="5.1.003" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Sonar"/>
          </operator>
          <operator activated="true" class="loop_values" compatibility="5.1.003" expanded="true" height="76" name="Loop Values" width="90" x="179" y="30">
            <parameter key="attribute" value="class"/>
            <parameter key="iteration_macro" value="class"/>
            <process expanded="true" height="509" width="806">
              <operator activated="true" class="filter_examples" compatibility="5.1.003" expanded="true" height="76" name="Filter Examples" width="90" x="45" y="30">
                <parameter key="condition_class" value="attribute_value_filter"/>
                <parameter key="parameter_string" value="class=%{class}"/>
              </operator>
              <operator activated="true" class="sample_bootstrapping" compatibility="5.1.003" expanded="true" height="76" name="Sample (Bootstrapping)" width="90" x="246" y="30">
                <parameter key="sample" value="absolute"/>
                <parameter key="sample_size" value="200"/>
              </operator>
              <connect from_port="example set" to_op="Filter Examples" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="example set output" to_op="Sample (Bootstrapping)" to_port="example set input"/>
              <connect from_op="Sample (Bootstrapping)" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="append" compatibility="5.1.003" expanded="true" height="76" name="Append" width="90" x="313" y="30"/>
          <connect from_op="Retrieve" from_port="output" to_op="Loop Values" to_port="example set"/>
          <connect from_op="Loop Values" from_port="out 1" to_op="Append" to_port="example set 1"/>
          <connect from_op="Append" from_port="merged set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Greetings,
    Sebastian