"to ask about data sampling"
m_r_nour
New Altair Community Member
Hi all
I have an unbalanced dataset . No of data in a class is 500 time more than No. of a data in other groups.
and I want to re sample such that the number of sample in all group is same.
How can I do that?
I tried to use sampling techniques but all of them just re sample and save ratio of number of sample in groups
Thank you for your consideration and time in advance
Regards
REZA
I have an unbalanced dataset . No of data in a class is 500 time more than No. of a data in other groups.
and I want to re sample such that the number of sample in all group is same.
How can I do that?
I tried to use sampling techniques but all of them just re sample and save ratio of number of sample in groups
Thank you for your consideration and time in advance
Regards
REZA
0
Answers
-
Hi,
which RapidMiner version do you use?
Greetings,
Sebastian0 -
ver 4.6
to clarification, I want to do this balanced sampling several times and make an average of them performance result to know overall performance in this method
thanks
Regards
REZA0 -
Hi,
I think there are several possibilities you could use:
If you are going to use a learner supporting example weights, you could use the EqualLabelWeighting. This will not sample the number of attributes, but equalizes the total weight assigned to each label. That might be even better, because no examples will be lost at all.
Another possibility would be to split the example set several times depending on the label and sample each subset to the same size. After this, all subsets would have to be merged and viola: You have a balanced example set.
If this becomes unhandy, because you have to many label values, you might use the ValueIterator and an IOStorer and IORetriever...
Ok, seems to be rather complex. Here's how it would work:<operator name="Root" class="Process" expanded="yes">
Hope this will help you, understand what I'm suggesting.
<operator name="ExampleSetGenerator" class="ExampleSetGenerator" breakpoints="after">
<parameter key="target_function" value="polynomial classification"/>
<parameter key="number_examples" value="1000"/>
</operator>
<operator name="ValueIterator" class="ValueIterator" expanded="yes">
<parameter key="attribute" value="label"/>
<operator name="ExampleFilter" class="ExampleFilter">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="label = %{loop_value}"/>
</operator>
<operator name="AbsoluteSampling" class="AbsoluteSampling">
</operator>
<operator name="Only do if already stored" class="ExceptionHandling" expanded="yes">
<operator name="Retrieve" class="IORetriever">
<parameter key="name" value="SetStorage"/>
<parameter key="io_object" value="ExampleSet"/>
</operator>
<operator name="ExampleSetMerge" class="ExampleSetMerge">
</operator>
</operator>
<operator name="In every case: Store" class="IOStorer">
<parameter key="name" value="SetStorage"/>
<parameter key="io_object" value="ExampleSet"/>
</operator>
</operator>
<operator name="IOConsumer" class="IOConsumer">
<parameter key="io_object" value="ExampleSet"/>
</operator>
<operator name="Final Retrieve" class="IORetriever">
<parameter key="name" value="SetStorage"/>
<parameter key="io_object" value="ExampleSet"/>
</operator>
</operator>
Greetings,
Sebastian0 -
thanks0