Problems with Auto Model Cluster Analysis

Terpdog
Terpdog New Altair Community Member
edited November 2024 in Community Q&A
"I am using Auto Model to do a k-means cluster analysis. Works fine for 2 clusters. With 3 or more clusters or or more cluster has an average distance of ? and a Davies-Bouldin index of infinity. This appeared before and I thought Version 9.6 had fixed it but apparently not. It also appears in the beta of 9.7. Is there a way around this? Thanks."

Welcome!

It looks like you're new here. Sign in or register to get started.

Answers

  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Hi @Terpdog,

    Can you share your data in order we can reproduce and understand what's going on ?

    Regards,

    Lionel
  • Terpdog
    Terpdog New Altair Community Member
    I am not sure what files are needed but I have attached the only rapidminer file I could find and also an Excel file of the data. I was using only the first four variables for the cluster analysis.
  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Hi @Terpdog,

    Thank you for sharing your data.
    I can reproduce what you observe : 


     But there is something strange in Auto-Model itself because
    if I'm using your data (only the first four variables) with a k-Means model (with k = 3, 4,etc) in a classic RapidMiner process,
    the results are correct (ie I obtain finite values for DB index and average distances) : 



    Has someone an idea of what's going on in Auto-Model (clustering) ?

    In attached file, the classic (working) process in RapidMiner.

    Regards,

    Lionel



  • Terpdog
    Terpdog New Altair Community Member
    edited May 2020
    Thanks Lionel. I did not think to try the process route. There has to be a bug in the Auto-Model routine. Hopefully that can get fixed. There is still a question of why the distances are negative which does not make sense.
  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    @Terpdog,

    The "real" distances are, of course, positive.
    It seems to me that RapidMiner multiply the distances by minus one (-1) in order to work with negative values because
    RapidMiner's algorithms are searching to MAXIMIZE these values. (explanation to be confirmed by the RM staff, @sgenzer ?)

    Regards,

    Lionel
  • Terpdog
    Terpdog New Altair Community Member
    That makes sense. I am continually frustrated at how hard it is to get routine statistics following an analysis in RapidMiner. I am trying to use this in my book which talks about measures of fit in techniques such as cluster analysis, discriminant analysis and logistic regression and I can't get RapidMiner to produce them or it is so difficult it would be of no use to students. I may have to drop the idea of using it. Too bad.

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.