Distance to cluster centre for every data point
Hi guys
Not a big expert in clustering and couldn't find suitable solution on the forum, so here's the question.
When I perform clustering, is there a simple RapidMiner way to obtain the exact distances to each cluster centre for each and every example in the dataset?
For example, if I have cluster1 and cluster2, and cluster1 contains examples v1, v2, v3, how could I find out which one from v1, v2, v3 is the closest (most representative example) or farthest (least representative example) from cluster1 center?
Thank you
Answers
-
Hi,
Can't you do Extract Cluster Centroids + Cross Distance?
BR,
Martin
1 -
Hi @mschmitz
Yes I can This seems to be a solution, though not very obvious.
But this way I guess I am geting indexes of examples (document column) for each cluster number (request column), correct?
So I will need then to somehow match these indexes with original examples if I want individual distances and not only min / max?0 -
Hi,
well you get the distance to each centroid. So you would need to throw an aggregate afterwards to figure out the closest cluster centroid.
Cheers,
Martin
0 -
you mean all distances or the lowest?
All distances would increase the memory quite a lot. I can see some reason to get the distance to the assigned cluster as a kind of "confidence"? Is that what you ask for?
BR,
Martin
0 -
good question. Especially because at least kmeans specifically calculates the number... @sebastian_land wrote it - so maybe he knows?
And maybe @sgenzer can make a ticket out of this
BR,
Martin
0 -
I certainly can. This is a feature request, not a bug - correct?
0