Clustering k-means

3erthe3er
3erthe3er New Altair Community Member
edited November 2024 in Community Q&A
Hello everyone, 
I am looking for a way to cluster data. With the tools I am using, I cannot directly find the right number of k, so the data is put into the number of clusters I have set k to. 

Is there any way/tool  I can find the right number of clusters without knowing it beforehand? 
And what kind of function should I use to check the result? / to check the robustness? 

I have read that the X-means cluster attribute should help to find the right number of clusters. 
I see a display on the right-hand side that makes an "assumption", but in my case this is incorrect and does not match the data set. 
Surely there must be an iterative/mathematical function that solves this problem? 

To clarify once again, the number of clusters into which my data set is clustered after the analysis is kmin. I am looking for an automatic method to find the right number of k. 
Maybe my selection of attributes is wrong? 

Thanks to everyone for the help. I appreciate it very much!


P.S Perhaps k means is also not the right choice? 
Any help is very much appreciated!! 😊

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Hi there,

    finding the number of clusters for a clustering algorithm is somewhat its toughest part.
    XMeans is already a way how to get a good estimate for k. There are some heuristics out there, most prominently the Ellbow method. But there is even a paper argueing you shouldn't use it: https://arxiv.org/pdf/2212.12189.pdf

    Also be careful with the normalization of your data. I see you do not use a normalize operator so it might create results you don't want. Same for the one-hot encoding you use.

    BR,
    Martin

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.