Hello all,
The Cluster Count Performance operator returns very odd values. I decided to look at the code to see what was going on and I noticed these lines in the file 'ClusterNumberEvaluator.java' at about line 90
for (int i = 0; i < model.getNumberOfClusters(); i++)
numItems = +model.getCluster(i).getNumberOfExamples();
numitems is set to one more than the number of examples in the last cluster.
This gets used later in this line
PerformanceCriterion pc = new EstimatedPerformance("Number of clusters", 1.0 - (((double) model.getNumberOfClusters()) / ((double) numItems)), 1, false);
So leads to weird values. I think numItems += model should fix it.
Anyway my question, before I embark on it, will it be possible to use the Groovy scrting operator to calcuate this myself?
regards
Andrew