Hi everyone,
I'm new to data mining and RapidMiner, and I'm having some difficulty figuring out how to set up an experiment my researcher friend has told me about.
I need to classify records, for which I'm using Nearest Neighbor with nominal values. There are seven possible labels for each record, and each of the 19 attributes in the record is a nominal value. I've been told the data is far too noisy to classify into seven distinct sets in one go, so what I should do is try and classify by running a binary split over each label: "Is Label 1, Is Not Label 1", "Is Label 2, Is Not Label 2"... and then using the result with the highest confidence as being the actual label.
eg. if Is Label 1 has a confidence of 70%, and Is Label 2 is 90%, I should use Label 2.
I have no idea how to set up this experiment. I don't believe Nearest Neighbor is suitable for this, but I don't know what learner to use. Nor do I know how to setup RapidMiner to run several experiments and choose the best output.
When I asked my friend what I should do, he came back with "I use a very expensive software package with proprietary algorithms, so I'm not sure how you would do it".
Does anyone have any ideas?
Thanks in advance!