"Meaning of Cluster Centroid Values"
Hello,
I'm interested to know more about the typical range and "best values" for the Cluster Centroid Evaluator output when using it for KMeans results. I'm trying to cluster texts and don't know, which k for KMeans would be the best.
Are there any papers about it available?
Is the range between 0 and 1? What does a value of 0,5 mean?
Why are the values in the log operator for example 0,3 and in the performance vector 0,03?
What is the difference between a value of 0,35 and 0,37? Is it a meaningful difference?
............
Also I like to know more about the ExampleDistribution and the ClusterSimilarity in the same way.
Thanks for any hints about it,
Thieme
Example:
Operator ItemDistributionEvaluator ClusterCentroidEvaluator ClusterDensityEvaluator
Lerner k Example distribution Avg. within centroid distance Avg. within cluster similarity
Kmeans 25 0,60771443854906100000 0,02404988245272690000 9,29991686777007000000
Kmeans 49 0,28560482296982600000 0,02650818439798560000 7,31511727219542000000
Kmeans 97 0,09401610548077870000 0,02706375671595980000 5,42341328360529000000
Kmeans 194 0,03743006260029780000 0,02850372897318460000 4,44378527924390000000
Kmedoid 25 0,06200754673336320000 0,03254450760601510000 6,52771248771305000000
Kmedoid 49 0,04150703349766510000 0,02898622290257270000 5,87483323589922000000
Kmedoid 97 0,02837976550188670000 0,03195484899801600000 5,26930481169764000000
Kmedoid 194 0,02037039714322890000 0,03105148039167270000 4,66888084366150000000
I'm interested to know more about the typical range and "best values" for the Cluster Centroid Evaluator output when using it for KMeans results. I'm trying to cluster texts and don't know, which k for KMeans would be the best.
Are there any papers about it available?
Is the range between 0 and 1? What does a value of 0,5 mean?
Why are the values in the log operator for example 0,3 and in the performance vector 0,03?
What is the difference between a value of 0,35 and 0,37? Is it a meaningful difference?
............
Also I like to know more about the ExampleDistribution and the ClusterSimilarity in the same way.
Thanks for any hints about it,
Thieme
Example:
Operator ItemDistributionEvaluator ClusterCentroidEvaluator ClusterDensityEvaluator
Lerner k Example distribution Avg. within centroid distance Avg. within cluster similarity
Kmeans 25 0,60771443854906100000 0,02404988245272690000 9,29991686777007000000
Kmeans 49 0,28560482296982600000 0,02650818439798560000 7,31511727219542000000
Kmeans 97 0,09401610548077870000 0,02706375671595980000 5,42341328360529000000
Kmeans 194 0,03743006260029780000 0,02850372897318460000 4,44378527924390000000
Kmedoid 25 0,06200754673336320000 0,03254450760601510000 6,52771248771305000000
Kmedoid 49 0,04150703349766510000 0,02898622290257270000 5,87483323589922000000
Kmedoid 97 0,02837976550188670000 0,03195484899801600000 5,26930481169764000000
Kmedoid 194 0,02037039714322890000 0,03105148039167270000 4,66888084366150000000