Normalization (training) with clustering (group model) does not work as expected

Question

Using a Normalization operator alongside k-Means operator to create a group model within a Cross-Validation or Split-Validation does not work because the Performance (Cluster Distance Performance) operator expects a CentroidClusterModel but instead received a GroupedModel. It seems that the Performance (Cluster Distance Performance) operator needs to be updated to accommodate a grouped model.
A simple example using the Iris dataset in the RapidMiner Samples directory is attached showing the issue.

YYH · Accepted Answer

Dear Prof @amitdeokar

Thanks for sharing the process of cross validated K-means. The normalize pre-processing model is grouped with clustering model in the training phase. But the clustering performance operator can only take a cluster model as a input, not a grouped model.

How about this ungroup and select added here in the testing phase?

Best,

YY

Telcontar120 · Accepted Answer

Another solution in similar types of scenarios would be to normalize your data outside the cross validation rather than inside on the training set.  This removes the need to pass the normalization model through to the test set so you don't need group models at all.  While this is not the preferred setup, because this technically leaks information from the full dataset into the training data, the effect is probably very small (you can actually do it both ways to see how large the effect is and whether it is a concern with your particular datdaset).