Cluster backetball players based on their performance...

LilC
LilC New Altair Community Member
edited November 5 in Community Q&A
I have a dataset with many players and their performance for the season.
My goal is to cluster them into 3 or more groups based on their performance, like high, average, low performance etc.. 
The attributes are like positions, ave points, steals, mistakes, blocks, running distance etc.... 

It probably will be some analysis to do with k-means I guess. But I don't think I will need all attributes to do the clustering. And the other task is to find out which few attributes can be used to split the players.

I am still very new to RapidMiner. And thanks for all the help from you guys.
If anyone can point me the direction to achieve it, that will be great. And I am open to any extensions.
Thanks.

Best Answers

  • jacobcybulski
    jacobcybulski New Altair Community Member
    Answer ✓
    If you were to use k-means then you'd need numerical attributes. Make sure that you select attributes that are independent of each other. While k-means is not a linear model you could use Correlation Matrix to establish independence of attributes - ignore the matrix but look at the weights - the higher the weight, the more (linearly) independent of other attributes (and vice versa). While there are may other way of weighing attributes, one great thing about doing it this way is that you do not need to define a label in this process (we are not predicting anything)

Answers

  • jacobcybulski
    jacobcybulski New Altair Community Member
    Answer ✓
    If you were to use k-means then you'd need numerical attributes. Make sure that you select attributes that are independent of each other. While k-means is not a linear model you could use Correlation Matrix to establish independence of attributes - ignore the matrix but look at the weights - the higher the weight, the more (linearly) independent of other attributes (and vice versa). While there are may other way of weighing attributes, one great thing about doing it this way is that you do not need to define a label in this process (we are not predicting anything)
  • LilC
    LilC New Altair Community Member
    edited July 2020
    Thanks again for the explanation and all the help. One more thing, after I used k-means, I did saw some video shows Cluster Distance Performance can be used to evaluate the clustering. Is there an illustration for 'Avg. within centroid distance' or 'Davies Bouldin'? You know like the rule of thumb correlation coefficients. 
    Or is that the result needs to be below 1 to make the clustering a 'good' one?