🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Iris vs real data clustering

User: "mariozupan"
New Altair Community Member
Updated by Jocelyn
If you look at the iris data matrix, you will notice, even visually,  data separation in two groups at least, dispersion. I try to choose variables on a real data, unfortunately my data matrix doesn't show obvious separation or even dispersion. My data matrix shows something like this below example is from R Cookbook)::
https://docs.google.com/presentation/d/1A7BbHjfYGiR13NBZSOqDvQjg0fshzPGajWPjEyQCfU0/edit

Whether it makes sense to apply k-means on that or even more concentrated data?

Find more posts tagged with

Sort by:
1 - 2 of 21
    User: "MariusHelf"
    New Altair Community Member
    As so often, the answer is: try it out :)
    Just apply the k-Means algorithm, and inspect the results, probably with the help of one of the clustering performance operators.

    Best regards,
    Marius
    User: "mariozupan"
    New Altair Community Member
    OP
    After I removed outliers, I looped k-means (R code) and I got average silhouette 0.61with very few negative values. I added matrix on third slide: https://docs.google.com/presentation/d/1A7BbHjfYGiR13NBZSOqDvQjg0fshzPGajWPjEyQCfU0/edit#slide=id.p
    But the problem is that I need technique(s) and auditor for solving  my nightmare:

    Technique for choosing attributes which will result with quality clusters. ANOVA, regression coefficient, Pearson correlation matrix or visually? If you looked at slide to (iris data) it is obviously that k-means will result with well separated clusters. My data worries me. What you do if you have data like on third slide? There are hundreds of financial performance indicators but always I try new set, it results with very weak correlation matrix and I was thinking that correlation matrix needs to be like that one on a second slide (second slide)

    I need a proof that my job clustering is done.