Distance to cluster centre for every data point

kypexin
kypexin New Altair Community Member
edited November 5 in Community Q&A

Hi guys

 

Not a big expert in clustering and couldn't find suitable solution on the forum, so here's the question.

 

When I perform clustering, is there a simple RapidMiner way to obtain the exact distances to each cluster centre for each and every example in the dataset?

 

For example, if I have cluster1 and cluster2, and cluster1 contains examples v1, v2, v3, how could I find out which one from v1, v2, v3 is the closest (most representative example) or farthest (least representative example) from cluster1 center?

 

Thank you :)

Tagged:

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓

    Hi,

    Can't you do Extract Cluster Centroids + Cross Distance?

     

    BR,

    Martin

  • kypexin
    kypexin New Altair Community Member

    Hi @mschmitz

     

    Yes I can :) This seems to be a solution, though not very obvious.

    But this way I guess I am geting indexes of examples (document column) for each cluster number (request column), correct?
    So I will need then to somehow match these indexes with original examples if I want individual distances and not only min / max?

     

    Screenshot 2018-06-21 11.16.28.png

     

    Screenshot 2018-06-21 11.11.49.png  

  • MartinLiebig
    MartinLiebig
    Altair Employee

    Hi,

     

    well you get the distance to each centroid. So you would need to throw an aggregate afterwards to figure out the closest cluster centroid.

     

    Cheers,

    Martin

  • kypexin
    kypexin New Altair Community Member

    Clear @mschmitz

     

    But is there a reason these distances were not included in the default output example set for clustering operators?

  • MartinLiebig
    MartinLiebig
    Altair Employee

    @kypexin,

    you mean all distances or the lowest?

    All distances would increase the memory quite a lot. I can see some reason to get the distance to the assigned cluster as a kind of "confidence"? Is that what you ask for?

     

    BR,

    Martin

  • kypexin
    kypexin New Altair Community Member

    @mschmitz not ALL distances, but as you said, for each example a distance to its 'parent' cluster only. And yes, this can serve as an analog for confidence parameter. 

  • MartinLiebig
    MartinLiebig
    Altair Employee

    @kypexin

    good question. Especially because at least kmeans specifically calculates the number... @sebastian_land wrote it - so maybe he knows?

     

    And maybe @sgenzer can make a ticket out of this :)

     

    BR,

    Martin

  • kypexin
    kypexin New Altair Community Member

    @mschmitz

    ok, nice. Seems I have just thrown in some little idea :) 

  • sgenzer
    sgenzer
    Altair Employee

    I certainly can. This is a feature request, not a bug - correct?