Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
Problems with Auto Model Cluster Analysis
Terpdog
"I am using Auto Model to do a k-means cluster analysis. Works fine for 2 clusters. With 3 or more clusters or or more cluster has an average distance of ? and a Davies-Bouldin index of infinity. This appeared before and I thought Version 9.6 had fixed it but apparently not. It also appears in the beta of 9.7. Is there a way around this? Thanks."
Find more posts tagged with
AI Studio
Clustering
Auto Model
Accepted answers
All comments
lionelderkrikor
Hi
@Terpdog
,
Can you share your data in order we can reproduce and understand what's going on ?
Regards,
Lionel
Terpdog
I am not sure what files are needed but I have attached the only rapidminer file I could find and also an Excel file of the data. I was using only the first four variables for the cluster analysis.
Regional Valley Mall.ioo
Regional Valley Mall.xlsx
lionelderkrikor
Hi
@Terpdog
,
Thank you for sharing your data.
I can reproduce what you observe :
But there is something strange in Auto-Model itself because
if I'm using your data (only the first four variables) with a k-Means model (with k = 3, 4,etc) in a classic RapidMiner process,
the results are correct (ie I obtain finite values for DB index and average distances) :
Has someone an idea of what's going on in Auto-Model (clustering) ?
In attached file, the classic (working) process in RapidMiner.
Regards,
Lionel
Clustering _K-Means.rmp
Terpdog
Thanks Lionel. I did not think to try the process route. There has to be a bug in the Auto-Model routine. Hopefully that can get fixed. There is still a question of why the distances are negative which does not make sense.
lionelderkrikor
@Terpdog
,
The "real" distances are, of course, positive.
It seems to me that RapidMiner multiply the distances by minus one (-1) in order to work with negative values because
RapidMiner's algorithms are searching to MAXIMIZE these values. (explanation to be confirmed by the RM staff,
@sgenzer
?)
Regards,
Lionel
Terpdog
That makes sense. I am continually frustrated at how hard it is to get routine statistics following an analysis in RapidMiner. I am trying to use this in my book which talks about measures of fit in techniques such as cluster analysis, discriminant analysis and logistic regression and I can't get RapidMiner to produce them or it is so difficult it would be of no use to students. I may have to drop the idea of using it. Too bad.
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups