"Sample operators"
bkruger
New Altair Community Member
Hi,
I have been using the "sample" operator to reduce the size of my input data. I have limited success in achieving what I want to and will appreciate any help. My main issue is that I don't know which specific operator to use and I cannot find documentation explaining each.
I have data with 4 labels with representation of each as A = 60%, B = 20%, C = 15% and D = 5%.
Irrespective of the statistic significance, I want to include at least all 5% of D and an equivalent portion of A, B and C.
How do I do this?
Thanks
B
I have been using the "sample" operator to reduce the size of my input data. I have limited success in achieving what I want to and will appreciate any help. My main issue is that I don't know which specific operator to use and I cannot find documentation explaining each.
I have data with 4 labels with representation of each as A = 60%, B = 20%, C = 15% and D = 5%.
Irrespective of the statistic significance, I want to include at least all 5% of D and an equivalent portion of A, B and C.
How do I do this?
Thanks
B
0
Answers
-
Hello bkruger
If I understand you correctly, than you want to sample the data in such a way that you have the same number of examples for all classes. Beware that this may change some properties of the data so that a model trained on this subset but applied to set of the initial structure may be biased.
How to:
Rapidminer is like lego, there is not a single operator to achieve this but the combination of many.
Here are the bricks:
- Filter Examples
- Multiply
- Join
- Sample
hope this was helpful,
steffen0 -
Hi,
since this is a common task, I added a example process on myExperiment.
Search for "Change Class Distribution of Your Training Data Set by Filtering and Sampling" in the myExperiment View to download the process.
http://www.myexperiment.org/workflows/1775.html
See http://rapid-i.com/component/option,com_myblog/show,Video-on-RapidMiner-Community-Extension-myExperiment-.html/Itemid,172/lang,en/ ; for the myExperiment stuff.
See also the "Same Number of Examples per Class" process here on myExperiment http://www.myexperiment.org/workflows/1315.html for a more sophisticated/generic solution.
Ciao Sebastian0