Hi Everybody,
Since im neither mathematician nor a computer scientist the answer to the following question might be quite simple but I'm still a little bit confused about the Clustering algorithms in RM:
A) Is it normal behaviour of the
Kmeans algorithm that it needs much more time (at least 10x) if the "
add characterization" button is switched on?

Is
DBscan the only density based algorithm currently implemented in RM?
C) As far as I understand the Kmeans algorithm should be capable of producing clusters of different cardinality. However, in my datasets the output clusters differ only slightly in their cardinality. Size of the largest cluster at most 5 or 6 times the size of the smallest one. Is this more likely to be a characteristic of the dataset or an artefact of the algorithm?
D) Using the
ClusterCentroidEvaluator, the output indicates negative average distances? Is it possible? Or just ignore the sign?
E) Are there performance vector in order to evaluate the
pairwaise similarity / overlap between clusters produced by
kmeans? Can I manipulate the output of
kmeans in a way that the
ClusterDensityEvaluator and the
ItemDistributionEvaluator accept it as an input?
F) Is there any particular reason why the
Ward method is not implemented as clustering algorithm in
hierarchical cluster models? (it is still quite often used in the publications in my discipline)
Best
Norbert