"Clustering - how to determine

Question

Hello,
could anyone point me how to do an unsupervised data clustering on data, where I am not sure how many clusters is present in data (i.e. how to determine k for e.g. k-means)? 
Or is the best possible way to determine the k visually (I have 13 attributes and the data might be quite noisy)?

Thanks for any suggestion,
radone

Andrew2 · Answer

Hello Clustering always requires a human to look at and interpret the results but a helping hand can be given by using various cluster performance operators. Here's an example showing the Cluster Distance Performance operator producing measures for "average within centroid distance" and Davies-Bouldin as k is varied in a k-means clustering experiment. The example data in this case contains 1000 examples that are grouped into 8 neat clusters in a three dimensional space. At the end of the experiment look at the Log tab in the results and plot the two recorded measures as a function of k and you should see that something interesting is happening at k = 8. Fortunately, this corresponds to the "correct" answer but in real life, it won't be as easy. The characteristics of the input data such as cluster shape, noise and data size will determine what clustering approach to use as well as what performance measure could be appropriate. Guidance is hard to give because a) it depends on the data and b) I probably don't know :) regards, Andrew