All,
I'm new to RapidMiner and the forum so apologies if I ask the obvious in the wrong place - I've looked around on the forum and on the net and could not find what I'm looking for.
The issue I have is this.
We send out files to one of our suppliers which contain references that are not unique. A reference may appear 1 to 4 times. Say for simplicity we send out a file like this:
ref1;somedata ref1;somedata ref2:somedata ref2:somedata |
Our supplier does his thing and sends his reply:
ref1:someresult ref1:someresult ref2:someresult ref2:someresult |
Basically what happens here is we send a transaction in twice, gets processed twice by the supplier, and gets reported twice by the supplier.
I would now like to link the response to the request. I cannot use a join, it will result in 8 output records. I cannot simply remove the duplicates of these 8, since some duplication is correct. So basically, I want to link one record in the input file to one record in the output file. As long as both ref1 records coming in are linked to both ref1 records going out I'm happy, doesn't matter which links to which.
Any idea how I can set this up in RapidMiner?
Regards,
Joe