"Sample operators"

New Altair Community Member

Jan 21, 2011

Updated Nov 5, 2024 by Jocelyn

Hi,

I have been using the "sample" operator to reduce the size of my input data. I have limited success in achieving what I want to and will appreciate any help. My main issue is that I don't know which specific operator to use and I cannot find documentation explaining each.

I have data with 4 labels with representation of each as A = 60%, B = 20%, C = 15% and D = 5%.

Irrespective of the statistic significance, I want to include at least all 5% of D and an equivalent portion of A, B and C.

How do I do this?

Thanks
B

Find more posts tagged with

AI Studio

Sampling

Sort by:

1 - 2 of 21

steffen

New Altair Community Member

Jan 21, 2011

Hello bkruger

If I understand you correctly, than you want to sample the data in such a way that you have the same number of examples for all classes. Beware that this may change some properties of the data so that a model trained on this subset but applied to set of the initial structure may be biased.

How to:
Rapidminer is like lego, there is not a single operator to achieve this but the combination of many.
Here are the bricks:
- Filter Examples
- Multiply
- Join
- Sample

hope this was helpful,

steffen

SebastianLoh

New Altair Community Member

Jan 21, 2011

Hi,

since this is a common task, I added a example process on myExperiment.

Search for "Change Class Distribution of Your Training Data Set by Filtering and Sampling" in the myExperiment View to download the process.

http://www.myexperiment.org/workflows/1775.html

See http://rapid-i.com/component/option,com_myblog/show,Video-on-RapidMiner-Community-Extension-myExperiment-.html/Itemid,172/lang,en/ ; for the myExperiment stuff.

See also the "Same Number of Examples per Class" process here on myExperiment http://www.myexperiment.org/workflows/1315.html for a more sophisticated/generic solution.

Ciao Sebastian

"Sample operators"

Find more posts tagged with

Quick Links