Clustering high-dimensional data

mskh · March 2019

Hi,
I try to use DBSACN to detect the outlier in my data set but it is difficult to set the parameters (epsilon,min points). Does anyone have an idea to solve the problem? it is possible to consider two clustering algorithms and each algorithm only consider sub-attributes of data set and i detect the outlier based on the results of two clustering algorithm?
Thanks

Telcontar120 · March 2019

DBSCAN is definitely one of those algorithms where you need to have domain expertise to set the parameters properly to get good results. You might want to try a simpler clustering algorithm first like k-means or hierarchical.
If you want to try to use two clustering algorithms based on different attributes, you'll need to multiply/split your dataset and feed one set of attributes to the first algorithm and a different set of attributes to the second algorithm, get the assigned clusters, and then join the two datasets back together again to compare.

mskh · March 2019

Thank you so much @Telcontar120.
I am a beginner in rapidminer. Will the results be different, if i use two clustering algorithms based on different attributes instead of one clustering algorithm on those attributes?
Thank you

Telcontar120 · March 2019

It is not possible to give a definitive answer without seeing the data, but in general you would not necessarily expect to get the same results if you are using different subsets of attributes or different clustering algorithms. You mentioned that you wanted to do this in your earlier comments, that is all.

Clustering high-dimensional data

Welcome!

Answers

Welcome!

Welcome!

Quick Links

Categories