"Dynamically determine number of clusters k-means"
Farnoush_r
New Altair Community Member
Hi
I want to build a model in rapid miner that can predict the number of clusters automatically and then continue to the k-means algorithm. The below post has some great ideas but it is connected to a log table. Is there any way to do this dynamically and create a macro to calculate the number of clusters and give it to k-means?
http://rapid-i.com/rapidforum/index.php?topic=3447.0
I want to build a model in rapid miner that can predict the number of clusters automatically and then continue to the k-means algorithm. The below post has some great ideas but it is connected to a log table. Is there any way to do this dynamically and create a macro to calculate the number of clusters and give it to k-means?
http://rapid-i.com/rapidforum/index.php?topic=3447.0
Tagged:
0
Answers
-
Hello
It is possible to convert a log to an example set; use the Log to Data operator. For Davies-Bouldin, you could look for a minimum by sorting this example set by the validity measure and then simply using the value of k that is associated with it.
If you are confident that the data is well behaved in all cases then you could try that.
Regards
Andrew0 -
Thank you for your helpful response but i have two more questions. First, I followed your proposition I have an example set which determines the best number of clusters, but is it possible to enter this to a clustering node and the clustering node read the number of clusters from the data? I thin the k should be set in the clustering node and it does not read it from an outer source
Second, I did not understand your worry about my data, cause apparently I am determining k each time based on the imported data and with any data the process determines the best k. so what is the problem?0 -
For the first question, use the Extract Macro operator to get the data value of a particular attribute and example within an example set. Use that macro later however you want.
For the second question, the Davies Bouldin validity measure uses mathematics to create a measure to identify clusters that are relatively less scattered individually and are maximally separated from one another. Who is to say whether this mathematical algorithm matches what truly is the best clustering?
0