Home
Discussions
Community Q&A
Clustering high-dimensional data
mskh
Hi,
I try to use DBSACN to detect the outlier in my data set but it is difficult to set the parameters (epsilon,min points). Does anyone have an idea to solve the problem? it is possible to consider two clustering algorithms and each algorithm only consider sub-attributes of data set and i detect the outlier based on the results of two clustering algorithm?
Thanks
Find more posts tagged with
AI Studio
Extensions
Clustering
Text Mining + NLP
Accepted answers
All comments
Telcontar120
DBSCAN is definitely one of those algorithms where you need to have domain expertise to set the parameters properly to get good results. You might want to try a simpler clustering algorithm first like k-means or hierarchical.
If you want to try to use two clustering algorithms based on different attributes, you'll need to multiply/split your dataset and feed one set of attributes to the first algorithm and a different set of attributes to the second algorithm, get the assigned clusters, and then join the two datasets back together again to compare.
mskh
Thank you so much
@Telcontar120
.
I am a beginner in rapidminer. Will the results be different, if i use two clustering algorithms based on different attributes instead of one clustering algorithm on those attributes?
Thank you
Telcontar120
It is not possible to give a definitive answer without seeing the data, but in general you would not necessarily expect to get the same results if you are using different subsets of attributes or different clustering algorithms. You mentioned that you wanted to do this in your earlier comments, that is all.
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)