Measuring clustering quality for a previously clustered data

singing_bird_1
singing_bird_1 New Altair Community Member
edited November 5 in Community Q&A

Hi all, I am new in rapidminer and I have a clustered data that has been clustered previously  and I want to load this data with its lable to rapidminer to be evaluated using one of the clustering evaluation measures

Note: I don't want to recluster my data, I want to evaluate it as it is with its lables.

How can I do this?

Thanks in advance

Answers

  • FBT
    FBT New Altair Community Member

    Edit: I misread your question. Would you be able to post your data, or parts of it? Measuring the performance should be straightforward, as long as labels and relevant attributes are available. 

  • singing_bird_1
    singing_bird_1 New Altair Community Member

    thanks for you reply

    attached is a part of the data and its clusters

    they are 3 clusters

    the problem is that the performance (SSE) requires the data (which is not the problem) and requires the centroid which is unknown, because it is already labeled.

    silhouette requires the data , the model or the centroid as well as the similarity measure

    how can i arrange the nodes in the process to get the quality of the given data? and which nodes should i use?

     

  • singing_bird_1
    singing_bird_1 New Altair Community Member
  • FBT
    FBT New Altair Community Member

    Ok, I don't think you can make any meaningful performance evaluations like this, because the data is missing information (e.g. the cluster model). What would you like to achieve? I.e. what is the question about the clusters that you would like to have answered?

  • singing_bird_1
    singing_bird_1 New Altair Community Member

    my question is how to achieve the clustering quality despite the missed info as cluster model and distance measure in silhouette?

    if the answer is : it is impossible to achieve the clustering quality here in rapidminer because of the missing info, so give me a way to measure the clustering quality via another program or give me the SSE and the silhouette code