Getting training and testing sets from KennardStoneSampling operator

pengie
pengie New Altair Community Member
edited November 5 in Community Q&A
Hi,

I have a dataset that I wish to split into a training set and a testing set. I wish to use the KennardStoneSampling operator but it seems like that will only provide me with the training set. How do I get the remaining compounds which were not selected as the testing set?
Tagged:

Answers

  • land
    land New Altair Community Member
    Hi,
    if I got you correct, you want to do the sampling algorithm something its not intended for. If you have one dataset, you might sample with the KennardStoneSampling, so that an equi distributed smaller sample remains. Thus, it selects some examples from the input set and returns them as output set. If you want to split your exampleSet into training and test set, you should use the SimpleValidation Operator. Take look into the operator description to understand how it works. You then probably will test a classifier's performance in combination with the sampling best, if you sample the training data but not the test data!


    Greetings,
      Sebastian
  • pengie
    pengie New Altair Community Member
    Thanks for the reply. I was hoping that I missed out on some operators but it seems like RapidMiner does not have the functionality that I want.

    Basically, in my field of research, one method to derive a training set and testing set from a dataset is to use the Kennard and Stone algorithm. The algorithm will select a set of distributed objects which can serve as a training set. The remaining objects which are not selected will be less distributed than the ones that were selected but will be similar to those selected. Hence, these objects will be useful as a testing set to gauge the performance of the model.

    I guess I have to look at the source code of KennardStoneSampling operator and see how I can modify it to be like the SimpleValidation operator.