K-Means and Optimizing K

Question

Dear All, I tried to find something similar in example setups but didn't find something similar. I am trying to figure out how to perform optimization of K-Means (finding the optimal number of k) through cross-validation. I tried using an XValidation operator but i cannot get it to work. Here is my setup which i wish to change : Could someone please help?

land · Answer

Hi, the problem is, that unsupervised learning can't really do any performance estimation. That's why it's called unsupervised: We simply don't know what's the true solution. So we cannot compare a clustering to another and say: Hey, that's one the true and the other ons is rubbish. That's why you are running into problems. But there are existing some measures which are heuristics for the goodness of clustering, but keep in mind, that heuristics may lead to non optimal solutions. You can enter these heuristics as you enter performance evaluators of regression and classification. Here's a small sample example for RapidMiner 5.0, that will show you how this works and that heuristics may fail: A cross-validation evaluating a decision tree model. Greetings, Sebastian