"Average Distance within Cluster"

Pinguicula
Pinguicula New Altair Community Member
edited November 2024 in Community Q&A
Sorry,

here was premature comment which resolved into mist after some further literature review. And I'm unfortunately unable to remove my message.

Best Norbert

Answers

  • IngoRM
    IngoRM New Altair Community Member
    Hi Norbert,

    I am no clustering expert myself but as far as I can see from the source code the calculation is roughly done as in the following pseudo code:

    count = 0;
    sum = 0.0;

    for each cluster C do {

    for each object O in C do {
                    distance = getDistanceFromCentroid(C, O);
                    sum = sum + v * v;
                    count++;
            }

    }

    result = sum / count;

    double divisionFactor = 1.0;
    if (getParameterAsBoolean(PARAMETER_NORMALIZE))
      divisionFactor = es.getAttributes().size();

    result = result / divisionFactor;

    Hope that helps. Maybe you did not take the normalization with the number of attributes into account?

    Cheers,
    Ingo
  • Pinguicula
    Pinguicula New Altair Community Member
    Hi Ingo,

    Your answer resolves somehow my problems.

    If my assumption is correct and in your pseudo code v is equivalent to distance the feature labelled average distance within cluster is actually the variance of the data points with the cluster and has little in common (exagerating)  ;) with the average distance within cluster used e.g. in the calculation of the Silhouette coefficient (Kaufman& Rousseeuw, 1990).

    By the way the Silhouette coefficient or the Hopkins statistic would be nice features in the next RM release.

    Best

    Norbert

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.