which operator?

Question

Hi,

I have a dataset containing 1m + rows which I wish to group based on the relationship between several columns.

(customer name / nominal / label),(date of contact / date),(sales rep no./nominal)

smith                                                1/1/11                            001
smith                                                2/1/11                            002
jones                                                3/2/11                            001
brown                                              2/2/11                            003
brown                                              3/2/11                            001
brown                                              3/2/11                            004
black                                                 6/2/11                           001
jones                                                4/2/11                            005
black                                                 5/2/11                           002

Now for the tough bit,
We need to classify the customers based on the unique group of sales reps they have dealt with, ie,
smith and black are in group A as they have both been contacted by 001 and 002, jones is B, brown is C ......................

Is this possible in RM, which operator/s do you suggest?

Thanks in advance.

MariusHelf · Answer

Hi, the best solution would probably be to pivot the data and then apply a clustering algorithm. You probably don't want a group for each unique set of sales reps, but for similar groups of sales reps, thus clustering will work good enough. If you have one million rows you may want to train the clustering model only on a subset for performance reasons and then apply it to the rest of the data. If the dates are not important, you could replace them with 1 if present in the pivoted data, and with 0 otherwise. Please have a look at the attached process. Best, Marius