Hi...looking for some data science advice. Say we wanted to create a process in RapidMiner that would be similar to a dating website (let's only take male-to-female hetero for the moment):
setup: I have two data sets: one of men with a lot of attributes about them and the women they have been interested in, and another of women with a lot of attributes about them and the men that they have been interested in. Most of these attributes on both sides are binominal / dummy coded categoricals but some are numerical (e.g. age).
goal: build a process where, if a new man logs in and fills out a survey to propagate his attributes (minus dating history - he's new), the output is a list of women that are most likely to be interesting to him - based on the training set above. Vice versa for women.
My initial thought is that this is a classic segmentation problem e.g. k-means clustering or something similar. But I want the output to be predictive with probabilities etc...
[Note: this is actually not my use case - I'm not building a dating site! But the case I'm working on is very similar in structure.]
Thoughts?
Scott