"Subspace Clustering on Binary Attributes."

adjo81
adjo81 New Altair Community Member
edited November 5 in Community Q&A
Hello All,

I am a beginner level professional in data mining and new to the topic of subspace clustering. I have a sample dataset which contains observations in terms of purchase orders and columns in terms of binary attributes (1/0) related to customization of same type of product.

The objective is to find whether there are any clusters present in this data. One of the approach is to use a PCA to convert binary to numerical scores and use these as input to k-means iterations.

However, I was trying to check if using hierarchical clustering on this data helps. I have used Jaccard dissimilarity metric and then dendrogram to find out the clusters. It seems no clear structure is present in the data, which the dendrogram containing few isolated clusters. This analysis was done in base R.

Later I came to know about subspace clustering. I am currently trying out an iteration in RapidMiner using subspace plugins, to be precise using the CLIQUE algorithm. However, it is being over an hour and no results have been obtained yet. I have set the tau and xi parameters as 0.1 and 2 respectively, which seem to be correct given the nature of dataset.

Would request comments/suggestions on improving the above situation. I am not sure on how the output of CLIQUE looks in RapidMiner, so would also appreciate some leads on this topic as well.

Best Wishes,
Aditya.

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Hi,

    where did you find CLIQUE? I just googled a bit around and found this extension: http://www-ai.cs.uni-dortmund.de/SOFTWARE/SUBSPACE_CLUSTERING/index.html which is also new to me..

    Kind of intersting stuff going on.

    ~Martin
  • adjo81
    adjo81 New Altair Community Member
    Hello Martin,

    I found this at the following link: http://dme.rwth-aachen.de/en/OpenSubspace

    It is available as a plugin for Rapid Miner and Weka as well. Some preprocessing in terms of setting independent vars/attributes of a purchase order as binary was required. Then i ran the CLIQUE algorithm, which ran over an hour and I had to stop it abruptly.

    Later on PROCLUS was also run on the same, and I was not able to interpret the results. All records which are assigned to the same cluster in the PROCLUS do not have the same attributes though, which was a surprise.

    Do let me know if you/someone else would be able to help me i nthis case
  • MartinLiebig
    MartinLiebig
    Altair Employee
    Sorry - i never heared of this extension before. I will definitly check it out later on.


    ~Martin