Dealing with Imbalanced Data
earmijo
New Altair Community Member
I'm studying the consequences of imbalanced data. I'm trying to replicate some earlier papers on the topic (e.g. Japkowicz 2002).
This is what I need to do, but I'm stuck:
1) Take the original dataset
2) Split it according to the value of the label (call the two new example sets : Common and Rare).
3) Resample (bootstrap) the Rare ExampleSet until it has the same size as the Common ExampleSet.
4) Join the resampled Rare with the old Common.
I can do it outside Rapid-I, but I was wondering if it can be done with a few operators.
Thanks in advance for any help,
\E
This is what I need to do, but I'm stuck:
1) Take the original dataset
2) Split it according to the value of the label (call the two new example sets : Common and Rare).
3) Resample (bootstrap) the Rare ExampleSet until it has the same size as the Common ExampleSet.
4) Join the resampled Rare with the old Common.
I can do it outside Rapid-I, but I was wondering if it can be done with a few operators.
Thanks in advance for any help,
\E
Tagged:
0
Answers
-
Almost inmediately after posting my question I found a way to do it. It is not very elegant and I'm sure it is not very useful if the dataset is huge, but it works fine for me. It is an example of oversampling the small class. I'll share it with you:
<operator name="Root" class="Process" expanded="yes">
<operator name="ChurnReductionExampleSetGenerator" class="ChurnReductionExampleSetGenerator">
</operator>
<operator name="IOMultiplier" class="IOMultiplier">
<parameter key="io_object" value="ExampleSet"/>
</operator>
<operator name="IOSelector" class="IOSelector">
<parameter key="io_object" value="ExampleSet"/>
</operator>
<operator name="ExampleFilter" class="ExampleFilter">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="label = terminate"/>
</operator>
<operator name="Bootstrapping" class="Bootstrapping">
<parameter key="sample_ratio" value="13.28"/>
</operator>
<operator name="IOSelector (2)" class="IOSelector">
<parameter key="io_object" value="ExampleSet"/>
<parameter key="select_which" value="2"/>
</operator>
<operator name="ExampleFilter (2)" class="ExampleFilter">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="label = ok"/>
</operator>
<operator name="ExampleSetMerge" class="ExampleSetMerge">
</operator>
</operator>0 -
Actually this issue has already been covered several times, once even by me..
http://rapid-i.com/rapidforum/index.php/topic,1246.msg4786.html#msg47860