Problems with Auto Model Cluster Analysis
Terpdog
New Altair Community Member
"I am using Auto Model to do a k-means cluster analysis. Works fine for 2
clusters. With 3 or more clusters or or more cluster has an average
distance of ? and a Davies-Bouldin index of infinity. This appeared
before and I thought Version 9.6 had fixed it but apparently not. It
also appears in the beta of 9.7. Is there a way around this? Thanks."
Tagged:
1
Answers
-
Hi @Terpdog,
Can you share your data in order we can reproduce and understand what's going on ?
Regards,
Lionel0 -
I am not sure what files are needed but I have attached the only rapidminer file I could find and also an Excel file of the data. I was using only the first four variables for the cluster analysis.
0 -
Hi @Terpdog,
Thank you for sharing your data.
I can reproduce what you observe :
But there is something strange in Auto-Model itself because
if I'm using your data (only the first four variables) with a k-Means model (with k = 3, 4,etc) in a classic RapidMiner process,
the results are correct (ie I obtain finite values for DB index and average distances) :
Has someone an idea of what's going on in Auto-Model (clustering) ?
In attached file, the classic (working) process in RapidMiner.
Regards,
Lionel
0 -
Thanks Lionel. I did not think to try the process route. There has to be a bug in the Auto-Model routine. Hopefully that can get fixed. There is still a question of why the distances are negative which does not make sense.
0 -
@Terpdog,
The "real" distances are, of course, positive.
It seems to me that RapidMiner multiply the distances by minus one (-1) in order to work with negative values because
RapidMiner's algorithms are searching to MAXIMIZE these values. (explanation to be confirmed by the RM staff, @sgenzer ?)
Regards,
Lionel0 -
That makes sense. I am continually frustrated at how hard it is to get routine statistics following an analysis in RapidMiner. I am trying to use this in my book which talks about measures of fit in techniques such as cluster analysis, discriminant analysis and logistic regression and I can't get RapidMiner to produce them or it is so difficult it would be of no use to students. I may have to drop the idea of using it. Too bad.
0