How to balance examples ?
Axel
New Altair Community Member
Hello everybody,
I have a classification problem with two classes and one of those classes is in large excess in my data set.
I would like to use roughly equal numbers of the two classes for my learner and so I wonder, if
there Is a way to select only a subset of the examples whose class is in excess ?
I looked at the Sampling operator, but that samples the same fraction from all classes.
Many thanks,
axel
I have a classification problem with two classes and one of those classes is in large excess in my data set.
I would like to use roughly equal numbers of the two classes for my learner and so I wonder, if
there Is a way to select only a subset of the examples whose class is in excess ?
I looked at the Sampling operator, but that samples the same fraction from all classes.
Many thanks,
axel
Tagged:
0
Answers
-
Hi there Axel.
There probably is a much smarter way of doing this, but I'm too wrecked to think of it ;D, so you'll have to make do with the following...<operator name="Root" class="Process" expanded="yes">
You'd better test it as well, as I haven't !
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="simple non linear classification"/>
</operator>
<operator name="Count All Examples" class="DataMacroDefinition">
<parameter key="macro" value="Total"/>
</operator>
<operator name="Change label to more tractable attribute" class="ChangeAttributeRole">
<parameter key="name" value="label"/>
</operator>
<operator name="Sort examples" class="Sorting">
<parameter key="attribute_name" value="label"/>
</operator>
<operator name="Take a copy for later" class="IOMultiplier">
<parameter key="io_object" value="ExampleSet"/>
</operator>
<operator name="Remove positives" class="ExampleFilter">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="label=positive"/>
<parameter key="invert_filter" value="true"/>
</operator>
<operator name="Count Negatives" class="DataMacroDefinition">
<parameter key="macro" value="Neg"/>
</operator>
<operator name="Calculate Positives" class="MacroConstruction">
<list key="function_descriptions">
<parameter key="Pos" value="%{Total}-%{Neg}"/>
</list>
<parameter key="use_standard_constants" value="false"/>
</operator>
<operator name="Compute First & Last deletions" class="MacroConstruction">
<list key="function_descriptions">
<parameter key="First" value="if(%{Neg}>=%{Pos},1,2*%{Neg}+1)"/>
<parameter key="Last" value="if(%{Neg}>=%{Pos},%{Total}-2*%{Pos},%{Total})"/>
</list>
</operator>
<operator name="Restore copy, trash other" class="IOSelector">
<parameter key="io_object" value="ExampleSet"/>
<parameter key="select_which" value="2"/>
<parameter key="delete_others" value="true"/>
</operator>
<operator name="Filter to equalise" class="ExampleRangeFilter">
<parameter key="first_example" value="%{First}"/>
<parameter key="last_example" value="%{Last}"/>
<parameter key="invert_filter" value="true"/>
</operator>
<operator name="Restore label" class="ChangeAttributeRole">
<parameter key="name" value="label"/>
</operator>
</operator>
Have fun...
0 -
Hi,
if your learner supports weighted examples, you could use the equal label weighting operator. It will distribute over all labels the same amount of weight.
But I guess we should add some sort of balancing operator in the future...
Greetings,
Sebastian
0 -
Wow Haddock,
that's not very nice, but it works !
Many thanks,
Axel
P.S. But I think, RapidMiner really needs a special operator for this...0 -
Hi, I can not make this code run on Rapid miner 5, I need help.
Thanks
Alejandro0 -
Hi,
well I think you either have to install RM4.x and load it there, store it and import the file, or you could extract another valid RapidMiner 4.x process file, insert the code there and import it with RapidMiner 5.0.
Or you simply build the process manually from scratch...
Greetings,
Sebastian0