keep both join attributes problem
Hello guys, i hope you can help me, i am really struggling with this problem
Follwing two simple csv files
File a:
id;label;cont1
0; ; a
1; 1b; b
2; 2c; c
File b:
id;label;cont2
0; 0x; x
1; 1b; y
2; ; z
the desired result:
id;label;cont1;cont2
0; 0x; a; x
1; 1b; b; y
2; 2c; c; z
Thought way to do: outer join with "keep both join attributes" and no "remove double attributes"
But unfortunately i am getting always the same result, no matter the option "remove double attributes" is on or off:
id;label;cont1;cont2
0; ?; a; x
1; 1b; b; y
2; 2c; c; z
Anyone has an idea what i am doing wrong? Tanks!
Answers
-
There may be a way to do this with a single join operator, but I'm not sure. In the meantime, it can be definitely done by treating the two labels as separate variables (you'll need to name them differently) and then creating an attribute that combines them after the join. See the attached process for an example (the two csv files I created from the simple examples you gave).
1 -
Thanks for the working quick solution and example!
I am still hoping for a better solution since this only a small example. In my case I have 15 database tables, which i want to join with each 5 common attributes - which means a quite big subprocess for only doing an outer join with keeping both attributes - still thinks its an bug, cause in my opinion this is excact the case for which the setting was made...
My current solution is an outer join and creation of a new table in the database - but this can only work temporary.
0 -
Hi Johannes,
are you sure that all of the attributes you would like to "duplicate" have role regular? Since special roles need to be unique join has the feature to remove the 2nd attribute with the same role.
~Martin
0 -
Yes, i double checked it.
0 -
Dear Johannes,
i've built the process with the data you have written down above, and its working for me? Have a look at the attached process. Maybe there is something different in yours?
~Martin
<?xml version="1.0" encoding="UTF-8"?><process version="7.2.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.2.003" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="subprocess" compatibility="7.2.003" expanded="true" height="82" name="Subprocess" width="90" x="45" y="34">
<process expanded="true">
<operator activated="true" class="generate_data_user_specification" compatibility="7.2.003" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="45" y="34">
<list key="attribute_values">
<parameter key="id" value="0"/>
<parameter key="label" value="str(0/0)"/>
<parameter key="cont1" value=""a""/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_data_user_specification" compatibility="7.2.003" expanded="true" height="68" name="Generate Data by User Specification (2)" width="90" x="45" y="136">
<list key="attribute_values">
<parameter key="id" value="1"/>
<parameter key="label" value=""1b""/>
<parameter key="cont1" value=""b""/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_data_user_specification" compatibility="7.2.003" expanded="true" height="68" name="Generate Data by User Specification (3)" width="90" x="45" y="238">
<list key="attribute_values">
<parameter key="id" value="2"/>
<parameter key="label" value=""2c""/>
<parameter key="cont1" value=""c""/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="append" compatibility="7.2.003" expanded="true" height="124" name="Append" width="90" x="179" y="85"/>
<connect from_op="Generate Data by User Specification" from_port="output" to_op="Append" to_port="example set 1"/>
<connect from_op="Generate Data by User Specification (2)" from_port="output" to_op="Append" to_port="example set 2"/>
<connect from_op="Generate Data by User Specification (3)" from_port="output" to_op="Append" to_port="example set 3"/>
<connect from_op="Append" from_port="merged set" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">Gen Table1</description>
</operator>
<operator activated="true" class="subprocess" compatibility="7.2.003" expanded="true" height="82" name="Subprocess (2)" width="90" x="45" y="187">
<process expanded="true">
<operator activated="true" class="generate_data_user_specification" compatibility="7.2.003" expanded="true" height="68" name="Generate Data by User Specification (4)" width="90" x="45" y="34">
<list key="attribute_values">
<parameter key="id" value="0"/>
<parameter key="label" value=""0x""/>
<parameter key="cont2" value=""x""/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_data_user_specification" compatibility="7.2.003" expanded="true" height="68" name="Generate Data by User Specification (5)" width="90" x="45" y="136">
<list key="attribute_values">
<parameter key="id" value="1"/>
<parameter key="label" value=""1b""/>
<parameter key="cont2" value=""y""/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_data_user_specification" compatibility="7.2.003" expanded="true" height="68" name="Generate Data by User Specification (6)" width="90" x="45" y="238">
<list key="attribute_values">
<parameter key="id" value="2"/>
<parameter key="cont2" value=""z""/>
<parameter key="label" value="str(0/0)"/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="append" compatibility="7.2.003" expanded="true" height="124" name="Append (2)" width="90" x="179" y="85"/>
<connect from_op="Generate Data by User Specification (4)" from_port="output" to_op="Append (2)" to_port="example set 1"/>
<connect from_op="Generate Data by User Specification (5)" from_port="output" to_op="Append (2)" to_port="example set 2"/>
<connect from_op="Generate Data by User Specification (6)" from_port="output" to_op="Append (2)" to_port="example set 3"/>
<connect from_op="Append (2)" from_port="merged set" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">Gen Table2</description>
</operator>
<operator activated="true" class="join" compatibility="7.2.003" expanded="true" height="82" name="Join" width="90" x="246" y="85">
<parameter key="remove_double_attributes" value="false"/>
<parameter key="join_type" value="outer"/>
<parameter key="use_id_attribute_as_key" value="false"/>
<list key="key_attributes">
<parameter key="id" value="id"/>
</list>
</operator>
<operator activated="true" class="generate_attributes" compatibility="7.2.003" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="85">
<list key="function_descriptions">
<parameter key="RealLabel" value="if(missing(label),label_from_ES2,label)"/>
</list>
<description align="center" color="transparent" colored="false" width="126">Built Real Label</description>
</operator>
<operator activated="true" class="select_attributes" compatibility="7.2.003" expanded="true" height="82" name="Select Attributes" width="90" x="648" y="85">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="cont1_from_ES2|label_from_ES2|label"/>
<parameter key="invert_selection" value="true"/>
</operator>
<connect from_op="Subprocess" from_port="out 1" to_op="Join" to_port="left"/>
<connect from_op="Subprocess (2)" from_port="out 1" to_op="Join" to_port="right"/>
<connect from_op="Join" from_port="join" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0 -
Hey @mschmitz this is basically the same thing as the process I posted--it keeps the two "label" variables separate through the join and then created a combined label using "generate attributes" after that. But I think @johannesjungblu wants this to be done natively by the outer join operator without additional manipulation afterwards.
I don't think it is a bug per se, but I tend to agree with him that it is potentially confusing, because the operator help specifies that only the left attribute values are kept if there are double attributes, even if the join type is other than a left join. So perhaps this is an idea for a product enhancement that could be passed along to the development team to have it take the value of a double attribute based on the join type (left=left value, right=right value, outer=union value, etc.).
1 -
Hi Brian,
to be honest, i do not think its a bug. I think its doing what it is supposed to do.
I think the desired behaviour for Johannes can be archived with a two operators. Loop Attributes + Generate Attributes. Then you can check for everything if it is missing and if so, take the right hand side.
~Martin
0 -
You are right!
I don´t get the option "keep both join attributes". The description implies my desired result:
keep both join attributes
Description: If checked, both columns of a join pair will be kept. Usually this is unneccessary since both attributes are identical. It may be useful to keep such a column if there are missing values on one side.
For what else is this option good?
0 -
Johannes,
it does what it supposed to do? If you check it, you get both attributes. ID and ID_from_left_ES. That's what its supposed to do, right?
~Martin
1 -
ahhhh sorry... I obviously missed that.
Is there a chance a "keep and combine attributes option" will be implemented in the near future?
0 -
can you elaborate on what you mean by keep and combine,
Not sure if you have seen the advanced options, click on show advanced options and it will allow you to select additonal column handling conditions
0 -
keep and combine should keep every pair of duplicate attributes in one atribute with a combined policy like: "take the not null entry".
If you mean the advanced options with remove and keep - yes.
0 -
hi...I would just rename the labels as label1 and label2 before the join. You're not joining on label anyway. Then you avoid this whole issue. Occam's Razor?
Scott
0 -
yes, i work with this solution, but it´s annoying rename and combine attriutes if you join 15 tables, if there is a possibility an option can make this...
0