Where in the process to place the 'Cross validation' operator?

tonyboy9
tonyboy9 New Altair Community Member
edited November 2024 in Community Q&A
In the customer segmentation process below, I believe I've answered in the cluster model using k means, which cluster of customers (by ID) to use. This would be the answer to my problem statement. 

I'm confused over where to place 'Cross validation'. The tutorials seem to indicate placing the operator after the 'retrieve' data set. At that point how does RapidMiner  validate a model not yet developed by k means clustering down the line?

Any helpful suggestions are greatly appreciated.


Best Answer

Answers

  • Telcontar120
    Telcontar120 New Altair Community Member
    Cross validation is an approach to model validation for supervised machine learning problems when you have a defined target variable (called the label in RapidMiner).  If you look at the tutorial process for that operator, you can see that inside it, you put the training learner on the left part of the process, and the validation on the right side.
    But clustering is an unsupervised machine learning problem, where there is no defined label in advance that you are trying to obtain.  So generally speaking Cross Validation is not applicable when you are doing clustering.

  • tonyboy9
    tonyboy9 New Altair Community Member
    Thanks for that, Brian. You wrote: "But clustering is an unsupervised machine learning problem, where there is no defined label in advance that you are trying to obtain.  So generally speaking Cross Validation is not applicable when you are doing clustering."

    So is there another way to go to validate a clustering model?