Joining datasets on a non-distinct attribute

kypexin · October 2017

Hi rapidminers,

Let's say I need to JOIN 2 datasets, A and B, on an attribute 'key'.

It is distinct for dataset A (each distinct value corresponds to only one example), but not for B (there are few examples with the same value of 'key' in .

What logic of joining will RapidMiner follow in this case? For each join 'A <--> key <--> B' what exactly example will be taken from B? First, last, random, or it totally depends on how dataset was stored in repo so it's not possible to know at all?

Thank you.

sgenzer · October 2017

So the short version is that it will REPEAT all of the id examples with the different B values:

Data Set 1 Data Set 2

ID Last ID First

1 JONES 2 Lynn

2 HOLMES 2 Lucile

3 SILVA 2 Leonardo

After inner join:

ID Last First

2 HOLMES Lynn

2 HOLMES Lucile

2 HOLMES Leonardo

After outer join:

ID Last First

2 HOLMES Lynn

2 HOLMES Lucile

2 HOLMES Leonardo

1 JONES ?

3 SILVA ?

left join is same as inner join in this case

After right join:

ID Last First

1 JONES ?

2 HOLMES Lynn

2 HOLMES Lucile

2 HOLMES Leonardo

3 SILVA ?

Clear as mud?

Scott

sgenzer · October 2017

So first it depends on which join you’re doing: inner, outer, left or right. You will most likely get different results depending on the data. Which one are you doing?

kypexin · October 2017

Hi @sgenzer

This one is actually an inner join (the most used one, though others might be occasionally present as well).

sgenzer · October 2017

So the short version is that it will REPEAT all of the id examples with the different B values:

Data Set 1 Data Set 2

ID Last ID First

1 JONES 2 Lynn

2 HOLMES 2 Lucile

3 SILVA 2 Leonardo

After inner join:

ID Last First

2 HOLMES Lynn

2 HOLMES Lucile

2 HOLMES Leonardo

After outer join:

ID Last First

2 HOLMES Lynn

2 HOLMES Lucile

2 HOLMES Leonardo

1 JONES ?

3 SILVA ?

left join is same as inner join in this case

After right join:

ID Last First

1 JONES ?

2 HOLMES Lynn

2 HOLMES Lucile

2 HOLMES Leonardo

3 SILVA ?

Clear as mud?

Scott

kypexin · October 2017

Clear, Captain.

Thanks

Thomas_Ott · October 2017

Captain Morgan!

Joining datasets on a non-distinct attribute

Best Answer

Answers

Categories