What is the relation between cluster size and centroid table? Which model makes more sense? Why?
Find more posts tagged with
Sort by:
1 - 8 of
81
Hey lionelderkrikor thanls for your explanation. if you allow me what do you mean with the "compacity" of the clusters?
how can I create performance and Elbow? still new to all of these methods.
how can I create performance and Elbow? still new to all of these methods.
Hi @NatalySimth,
1. By "compacity" I mean "how close the data points are from their centroid".
2.Create performance and Elbow :
You can easily create a such curve by using an optimization loop via the Optimize parameters and Cluster Distance Performance operators.
By executing a such process, you will obtain a table of the Average within Centroid Distance according to k (the number of clusters) :

Then you can plot this table with a Series type plot with :
- Index Dimension = k
- Plot Series = Average within centroid distance.
You obtain the following curve :

For this example, we can find that the inflexion point (Elbow) is for k = 4 or k = 5. Thus the optimal number clusters k for this use case is k = 4 or k =5 .
The process used for this post is in attached file.
Hope this helps,
Regards,
Lionel
PS : To understand the concept of clustering you can visit the RapidMiner Academy : There are interesting videos on this topic :
https://academy.rapidminer.com/catalog?query=cluster
1. By "compacity" I mean "how close the data points are from their centroid".
2.Create performance and Elbow :
You can easily create a such curve by using an optimization loop via the Optimize parameters and Cluster Distance Performance operators.
By executing a such process, you will obtain a table of the Average within Centroid Distance according to k (the number of clusters) :

Then you can plot this table with a Series type plot with :
- Index Dimension = k
- Plot Series = Average within centroid distance.
You obtain the following curve :

For this example, we can find that the inflexion point (Elbow) is for k = 4 or k = 5. Thus the optimal number clusters k for this use case is k = 4 or k =5 .
The process used for this post is in attached file.
Hope this helps,
Regards,
Lionel
PS : To understand the concept of clustering you can visit the RapidMiner Academy : There are interesting videos on this topic :
https://academy.rapidminer.com/catalog?query=cluster
@lionelderkrikor Thanks a million!
So useful information.

Hi @lionelderkrikor,
thank you for your inspiring answer from above! In this sense, it should be also possible to generate the Ellbow by using the Davies-Bouldin index in order to compare the main criterion, right?
Thank you in advance for your answer!
Regards!
thank you for your inspiring answer from above! In this sense, it should be also possible to generate the Ellbow by using the Davies-Bouldin index in order to compare the main criterion, right?
Thank you in advance for your answer!
Regards!
lionelderkrikor thanks for the explanation.
But can you please let me know how you get the inertia plot in rapidminer, as the options present in it are only avg within centroid and DB.
I want to plot it on the basis of inertia criterion. Please help
But can you please let me know how you get the inertia plot in rapidminer, as the options present in it are only avg within centroid and DB.
I want to plot it on the basis of inertia criterion. Please help
Sort by:
1 - 1 of
11
Hi @NatalySimth,
1. By "compacity" I mean "how close the data points are from their centroid".
2.Create performance and Elbow :
You can easily create a such curve by using an optimization loop via the Optimize parameters and Cluster Distance Performance operators.
By executing a such process, you will obtain a table of the Average within Centroid Distance according to k (the number of clusters) :

Then you can plot this table with a Series type plot with :
- Index Dimension = k
- Plot Series = Average within centroid distance.
You obtain the following curve :

For this example, we can find that the inflexion point (Elbow) is for k = 4 or k = 5. Thus the optimal number clusters k for this use case is k = 4 or k =5 .
The process used for this post is in attached file.
Hope this helps,
Regards,
Lionel
PS : To understand the concept of clustering you can visit the RapidMiner Academy : There are interesting videos on this topic :
https://academy.rapidminer.com/catalog?query=cluster
1. By "compacity" I mean "how close the data points are from their centroid".
2.Create performance and Elbow :
You can easily create a such curve by using an optimization loop via the Optimize parameters and Cluster Distance Performance operators.
By executing a such process, you will obtain a table of the Average within Centroid Distance according to k (the number of clusters) :

Then you can plot this table with a Series type plot with :
- Index Dimension = k
- Plot Series = Average within centroid distance.
You obtain the following curve :

For this example, we can find that the inflexion point (Elbow) is for k = 4 or k = 5. Thus the optimal number clusters k for this use case is k = 4 or k =5 .
The process used for this post is in attached file.
Hope this helps,
Regards,
Lionel
PS : To understand the concept of clustering you can visit the RapidMiner Academy : There are interesting videos on this topic :
https://academy.rapidminer.com/catalog?query=cluster
Without any additionnal informations, to have a general idea, you can calculate the Average within centroid distance which measure the "compacity" of the clusters.(to compare the 2 models).
For that, you have to put a Performance (Cluster Distance Performance) operator at the end of your process.
Edit :
I wanted to correct /complete the explanation above :
Assuming that you are using K-means algorithm, a method to find the best k (number of cluster(s)), and thus the best model, is to plot the "Average within centroid distance" according to "k". You will obtain a curve like that (or in the opposite direction since the Average within centroid distance are negative in RapidMiner):
The best k and thus the more relevant model matches with the inflexion point of the curve.
Hope this helps,
Regards,
Lionel