Partitioning the DataSet into N samples
John_De_Jong
New Altair Community Member
Is there a Preprocessing Filter in Rapid Miner where i can take a whole dataset, and create N samples with same distribution as the originial data set.
An example
I have data set with 1Million data, with two classes. So original Instance has size of 1 Million. I want to sub-sample them into 50 sub-samples with 20K data in each sample, i.e size of sample1, sample2...sample50 is 20K. When i run the filter i get 50 Instances, and each Instance has 20K, and each sample of 20K is unique samples from 1 Million, and it has same balance between the labels as in 1 Million, i.e if label1 had 90% and label2 had 10%, in 20K i have 18K of label1 and 2K of label2.
Any help would be appreciated
John
An example
I have data set with 1Million data, with two classes. So original Instance has size of 1 Million. I want to sub-sample them into 50 sub-samples with 20K data in each sample, i.e size of sample1, sample2...sample50 is 20K. When i run the filter i get 50 Instances, and each Instance has 20K, and each sample of 20K is unique samples from 1 Million, and it has same balance between the labels as in 1 Million, i.e if label1 had 90% and label2 had 10%, in 20K i have 18K of label1 and 2K of label2.
Any help would be appreciated
John
Tagged:
0