"Average Distance within Cluster"
Pinguicula
New Altair Community Member
Sorry,
here was premature comment which resolved into mist after some further literature review. And I'm unfortunately unable to remove my message.
Best Norbert
here was premature comment which resolved into mist after some further literature review. And I'm unfortunately unable to remove my message.
Best Norbert
Tagged:
0
Answers
-
Hi Norbert,
I am no clustering expert myself but as far as I can see from the source code the calculation is roughly done as in the following pseudo code:
count = 0;
sum = 0.0;
for each cluster C do {
for each object O in C do {
distance = getDistanceFromCentroid(C, O);
sum = sum + v * v;
count++;
}
}
result = sum / count;
double divisionFactor = 1.0;
if (getParameterAsBoolean(PARAMETER_NORMALIZE))
divisionFactor = es.getAttributes().size();
result = result / divisionFactor;
Hope that helps. Maybe you did not take the normalization with the number of attributes into account?
Cheers,
Ingo0 -
Hi Ingo,
Your answer resolves somehow my problems.
If my assumption is correct and in your pseudo code v is equivalent to distance the feature labelled average distance within cluster is actually the variance of the data points with the cluster and has little in common (exagerating) with the average distance within cluster used e.g. in the calculation of the Silhouette coefficient (Kaufman& Rousseeuw, 1990).
By the way the Silhouette coefficient or the Hopkins statistic would be nice features in the next RM release.
Best
Norbert0