Combining (classified) example sets

kubat
kubat New Altair Community Member
edited November 5 in Community Q&A
Hello,

I have a pretty straightforward classification task, and I'm experimenting with a variety of classifiers (thanks for making this so easy!).

Unfortunately, because of overlap in the features of my training data, I can not use straight cross-validation—if I were to, some data from my training would leak into the test set. So: I've created five splits of my data, training and test pairs which have no overlap. I've set up five replicated model learning and application, so now I have the classified output of these five models.

Here is my question: What block can I use to merge the resulting example sets so I can have one overall performance measure? Using the "Append" set operation does't work because the attributes aren't matched (is this because the example sets include both real and categorical?).

Cheers,
Rony

Answers

  • earmijo
    earmijo New Altair Community Member
    Can you use batch cross validation? This way you have full control over how the data is split.
  • kubat
    kubat New Altair Community Member
    Alas, this is not really possible because of the nature of my data.

    I have about 2000 or so training examples which encode features from sometimes overlapping periods of time. I have some custom code which chooses randomized training sets then removes any examples from the held out testing set which overlap temporally (and hence have some of the same features) the training. Keeping track of the indices of overlapping training data would be hell.

    It seems like this would be something that would come up, and I even found a module in the javadoc: com.rapidminer.operator.preprocessing.join.ExampleSetMerge, but not the corresponding block in the GUI.

    Cheers,
    Rony