"k-means Clustering which data belongs to which cluster?"

Question

Hi Community,

I would like to cluster countries due to several factors like:  purchasing power, competition, turnover, Ease of doing business, tariffs, political stability etc. etc. 
I am creating an Input list with the aim to have a numerical value for each and every factor (that makes it easier to cluster). 
As Output I would like to have (let's say for example) 3 cluster and I would like to see which country belongs to wich cluster... 
I am working currently with the k-means operator which works quite well but I am not able to see which country belongs to which cluster....
Does anybody has a suggestions?
Thanks a head.
Best regards,Carlo

YYH · Accepted Answer

Hi @Carlo,

If you have a columns for country name or country code, you can set it as a special role (id/name). Also make sure you add a cluster label from k-means. Then the clustering model will return a data table with one reference columns for country name, another new column added for cluster label.

I used the ICU patient data as example.

YY

YYH · Accepted Answer

Hi @Carlo,
We can convert the region codes from nominal to dummy coding (nominal to numerical operator) and then multiply the region dummy code by 3, or multiply by 5 to change the range of the numerical region attributes to [0,5]. You would also need to apply some normalization on the other columns: purchasing power, competition, turnover, Ease of doing business, tariffs, political stability to make sure these normalized attributes have a smaller range, saying [0.1]. K-NN model with Chebyshev distance will take the region factor as the most important one since distance based clustering models are always sensitive to normalization. This kind of human-interference will increase the weight on region factor. You would need some testing on the multiply factor for region. To  get guaranteed results, fitting several clustering models on the subset for each region would be ideal.
YY