modeling many-to-many matching
Hi...looking for some data science advice. Say we wanted to create a process in RapidMiner that would be similar to a dating website (let's only take male-to-female hetero for the moment):
setup: I have two data sets: one of men with a lot of attributes about them and the women they have been interested in, and another of women with a lot of attributes about them and the men that they have been interested in. Most of these attributes on both sides are binominal / dummy coded categoricals but some are numerical (e.g. age).
goal: build a process where, if a new man logs in and fills out a survey to propagate his attributes (minus dating history - he's new), the output is a list of women that are most likely to be interesting to him - based on the training set above. Vice versa for women.
My initial thought is that this is a classic segmentation problem e.g. k-means clustering or something similar. But I want the output to be predictive with probabilities etc...
[Note: this is actually not my use case - I'm not building a dating site! But the case I'm working on is very similar in structure.]
Thoughts?
Scott
Find more posts tagged with
@Thomas_Ott hmm. I do not think what I'm doing is going to help anyone's personal life.
@Telcontar120 yes creating two separate models is exactly what I was planning to do. I have never fiddled with the recommender extension before but I think today is the day to do so. Any nice sample processes I can look at to get a feel for it?
Scott
Sadly I do not have any samples to offer for recommendation models (they all stayed at a former employer) but the operators are not hard to use and I am sure you will figure it out quickly. Or @mschmitz might have something to offer?
Hi All,
not really something to share. I think it boils down to Item Recommendation / Cross Distances.
What are your demands on the answer time? One has the option to built a shitload of models first (e.g. to predict the correct cluster). In recommender systems you hit a problem with response times here. So maybe this could still be an option
yes exactly @mschmitz. I could just run NN all the time but it is very slow. I am looking for a low-latency solution. And I am happy to hear that you came up with the same hack that I did (store a ton of models and then choose on the fly). I'm trying to do as much preprocessing as possible but at some point I need a way to create the "match" via applying some model - quickly.
Scott
Well what women the men are interested in might not lead to a good match. I can search for specific criteria of women on a dating site but still not get them to respond. Perhaps the better thing is to indentify what critiera in the men lead to a succesful date from the women.
It's funny that you post this. I just watched this Vice video about Tindr and other dating related websites. It's potentially NSFW for some but I found it interesting from a data science perspective: https://www.youtube.com/watch?v=J9V3fLUSQFM