Clustering high-dimensional data

mskh
mskh New Altair Community Member
edited November 5 in Community Q&A
Hi,
I try to use DBSACN to detect the outlier in my data set but it is difficult to set the parameters (epsilon,min points). Does anyone have an idea to solve the problem? it is possible to consider two clustering algorithms and each algorithm only consider sub-attributes of data set and i detect the outlier based on the results of two clustering algorithm?
Thanks 

Answers

  • Telcontar120
    Telcontar120 New Altair Community Member
    DBSCAN is definitely one of those algorithms where you need to have domain expertise to set the parameters properly to get good results.  You might want to try a simpler clustering algorithm first like k-means or hierarchical.
    If you want to try to use two clustering algorithms based on different attributes, you'll need to multiply/split your dataset and feed one set of attributes to the first algorithm and a different set of attributes to the second algorithm, get the assigned clusters, and then join the two datasets back together again to compare.
  • mskh
    mskh New Altair Community Member
    Thank you so much @Telcontar120.
    I am a beginner in rapidminer. Will the results be different, if i use two clustering algorithms based on different attributes instead of one clustering algorithm on those attributes?
    Thank you
  • Telcontar120
    Telcontar120 New Altair Community Member
    It is not possible to give a definitive answer without seeing the data, but in general you would not necessarily expect to get the same results if you are using different subsets of attributes or different clustering algorithms.  You mentioned that you wanted to do this in your earlier comments, that is all.