Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
"Sample operators"
bkruger
Hi,
I have been using the "sample" operator to reduce the size of my input data. I have limited success in achieving what I want to and will appreciate any help. My main issue is that I don't know which specific operator to use and I cannot find documentation explaining each.
I have data with 4 labels with representation of each as A = 60%, B = 20%, C = 15% and D = 5%.
Irrespective of the statistic significance, I want to include at least all 5% of D and an equivalent portion of A, B and C.
How do I do this?
Thanks
B
Find more posts tagged with
AI Studio
Sampling
Accepted answers
All comments
steffen
Hello bkruger
If I understand you correctly, than you want to sample the data in such a way that you have the same number of examples for all classes. Beware that this may change some properties of the data so that a model trained on this subset but applied to set of the initial structure may be biased.
How to:
Rapidminer is like lego, there is not a single operator to achieve this but the combination of many.
Here are the bricks:
- Filter Examples
- Multiply
- Join
- Sample
hope this was helpful,
steffen
SebastianLoh
Hi,
since this is a common task, I added a example process on myExperiment.
Search for "Change Class Distribution of Your Training Data Set by Filtering and Sampling" in the myExperiment View to download the process.
http://www.myexperiment.org/workflows/1775.html
See
http://rapid-i.com/component/option,com_myblog/show,Video-on-RapidMiner-Community-Extension-myExperiment-.html/Itemid,172/lang,en/
; for the myExperiment stuff.
See also the "Same Number of Examples per Class" process here on myExperiment
http://www.myexperiment.org/workflows/1315.html
for a more sophisticated/generic solution.
Ciao Sebastian
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups