input ncluster for k-means
kdamodaran
New Altair Community Member
Hi all,
I am new to rapidminer. I am interested in applying k-means clustering for a dataset consisting of a few thousand elements, and the attributes are real valued. So, the standard, sum of squared distances to the centroid will work as the metric for convergence.
A couple of trials I have run using k-means just partitions the data into two clusters, which seems to be the default? How can specify the number of clusters?
Thanks,
Dam
I am new to rapidminer. I am interested in applying k-means clustering for a dataset consisting of a few thousand elements, and the attributes are real valued. So, the standard, sum of squared distances to the centroid will work as the metric for convergence.
A couple of trials I have run using k-means just partitions the data into two clusters, which seems to be the default? How can specify the number of clusters?
Thanks,
Dam
Tagged:
0
Answers
-
Hi,
Click on the k-Means operator box in the process and set k in the Parameters window to the desired value.
BTW, the convergence of the algorithm is given by the fact that the centroids do not change in two consecutive
iterations. Regarding the sum of squared distances (i.e. the squared error), it provides a criterion to select the best solution among the generated possibly multiple solutions.
Regards,
Dan0 -
That's what I was expecting too. But I don't get a Parameter window. Am I not seeing that's totally obvious?! The only thing that seems close in the dialog box is "Show Operator Info", which also doesn't have a parameter window.dan_agape wrote:
Hi,
Click on the k-Means operator box in the process and set k in the Parameters window to the desired value.
BTW, the convergence of the algorithm is given by the fact that the centroids do not change in two consecutive
iterations. Regarding the sum of squared distances (i.e. the squared error), it provides a criterion to select the best solution among the generated possibly multiple solutions.
Regards,
Dan
On a related note, is it possible to retain the nominal ids of the elements being processed. Sure, we can always drop the clustering output into excel and match with original ids but ............
Thanks for your help!
Dam
0 -
Hi,
might be you have deactivated the according view. Go to the menu View, select Show View and then Parameters if not already selected.
For more information about RapidMiner's gui and the concepts in general I would suggest you take a look at the Manual that's available in english and german.
Greetings,
Sebastian0