"advice on which clustering/classification operators to use"
I'm looking for some recommendations on which operators might be best for my task. My task is as follows: I have an example set which consists of a text field and a label. There are 4 possible values for the label field. Each text field has already been assigned a label by a human being. The catch is that there is concern that the assignment of labels is either not being done carefully, or that items are purposely being assigned incorrect labels.
I went through all of the normal document processing, tokenizing, filtering out stop words, etc. My first thought was to use k-nn to see how well the predicted labels would match up with the pre-assigned labels, then I could perhaps create an exception set of instances where k-nn thought the text might be misclassified. However, I'm not crazy about the lack of output/diagnostics from k-nn. I would prefer to have some additional information about how certain the algorithm is about the label it has assigned.
So, I started to look at some unsupervised methods. I tried k-means but it doesn't seem to offer that much more in diagnostics or output than k-nn. I'm looking at the Expectation Maximization Clustering but it seems to hang and not complete. It sounds like some sort of fuzzy clustering is what I want, but it doesn't sound like there are any operators like that right now for RapidMiner.
So, are there any operators or extensions that offer fuzzy clustering or something similar ? What I'm looking for is either a supervised method that returns some info on the certainty of each label assignment, or an unsupervised method that provides info on the certainty of each assignment, plus info on the characteristics of each cluster.
Any help would be much appreciated, thanks in advance !