"Categorical Clustering"

pvelando
pvelando New Altair Community Member
edited November 2024 in Community Q&A
Hi all,

I'm trying to clouster this data that has numerical and categorical attributes:

high 177 180 187 180 177 188 177 189 177 166 166 164 170 170 160 164 167 168
weight 86 79 85 83 87 80 78 80 82 72 66 65 79 67 61 61 63 68
Param1 A M V M A M V V A V N M N V A N A M
Param2 H H H H H H H H H M M M M M M M M M

There is no way to convert categorical attributes in numercial.

I would like to know which would be the right algorithm to cluster this data that takes into consideration the non-numerical attributes; which are certainly relevant in term of clustering significance (k-means definetly does not work).

Well, thank you very much in advance,

Answers

  • pvelando
    pvelando New Altair Community Member
    After some testing. I've seen that agglomerative clustering might work, although the results are not very handy.
  • Andrew2
    Andrew2 New Altair Community Member
    Hello

    K-means will work with this data if you use the distance measure 'mixed euclidean distance'. You will probably have to normalize the numerical attributes to be between 0 and 1 for all the attributes to have an equal influence.

    Regards

    Andrew