Distance to cluster centre for every data point

Hi guys

Not a big expert in clustering and couldn't find suitable solution on the forum, so here's the question.

When I perform clustering, is there a simple RapidMiner way to obtain the exact distances to each cluster centre for each and every example in the dataset?

For example, if I have cluster1 and cluster2, and cluster1 contains examples v1, v2, v3, how could I find out which one from v1, v2, v3 is the closest (most representative example) or farthest (least representative example) from cluster1 center?

Thank you

Find more posts tagged with

AI Studio

Accepted answers

MartinLiebig

Hi,

Can't you do Extract Cluster Centroids + Cross Distance?

BR,

Martin

All comments

MartinLiebig

Hi,

Can't you do Extract Cluster Centroids + Cross Distance?

BR,

Martin

kypexin

Hi @mschmitz

Yes I can This seems to be a solution, though not very obvious.

But this way I guess I am geting indexes of examples (document column) for each cluster number (request column), correct?
So I will need then to somehow match these indexes with original examples if I want individual distances and not only min / max?

Screenshot 2018-06-21 11.16.28.png

Screenshot 2018-06-21 11.11.49.png

Screenshot 2018-06-21 11.16.28.png

MartinLiebig

Hi,

well you get the distance to each centroid. So you would need to throw an aggregate afterwards to figure out the closest cluster centroid.

Cheers,

Martin

kypexin

Clear @mschmitz

But is there a reason these distances were not included in the default output example set for clustering operators?

MartinLiebig

@kypexin,

you mean all distances or the lowest?

All distances would increase the memory quite a lot. I can see some reason to get the distance to the assigned cluster as a kind of "confidence"? Is that what you ask for?

BR,

Martin

kypexin

@mschmitz not ALL distances, but as you said, for each example a distance to its 'parent' cluster only. And yes, this can serve as an analog for confidence parameter.

MartinLiebig

@kypexin

good question. Especially because at least kmeans specifically calculates the number... @sebastian_land wrote it - so maybe he knows?

And maybe @sgenzer can make a ticket out of this

BR,

Martin

kypexin

@mschmitz

ok, nice. Seems I have just thrown in some little idea

sgenzer

I certainly can. This is a feature request, not a bug - correct?