"Error on joining data"
lmsasu
New Altair Community Member
Hello all,
I am trying to join two different excel files, having a common ID column which can be used for joining. However, I keep receiving an error at runtime: "Process failed. duplicate attribute name: 1". The first row of each file contains the column names, the columns have different names (excepting the one used for joining). What's wrong?
Thanks in advance,
Lucian
I am trying to join two different excel files, having a common ID column which can be used for joining. However, I keep receiving an error at runtime: "Process failed. duplicate attribute name: 1". The first row of each file contains the column names, the columns have different names (excepting the one used for joining). What's wrong?
Thanks in advance,
Lucian
0
Answers
-
Hi,
actually I can't reproduce the problem. See here for my simple example process:<?xml version="1.0" encoding="UTF-8" standalone="no"?>
Could you please post your process setup, too?
<process version="5.1.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.003" expanded="true" name="Process">
<process expanded="true" height="431" width="550">
<operator activated="true" class="generate_data" compatibility="5.1.003" expanded="true" height="60" name="Generate Data" width="90" x="45" y="75"/>
<operator activated="true" class="generate_id" compatibility="5.1.003" expanded="true" height="76" name="Generate ID (2)" width="90" x="246" y="165"/>
<operator activated="true" class="select_attributes" compatibility="5.1.003" expanded="true" height="76" name="Select Attributes" width="90" x="313" y="30">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="att2|id"/>
</operator>
<operator activated="true" class="generate_id" compatibility="5.1.003" expanded="true" height="76" name="Generate ID" width="90" x="246" y="300"/>
<operator activated="true" class="select_attributes" compatibility="5.1.003" expanded="true" height="76" name="Select Attributes (2)" width="90" x="380" y="300">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="att3|id|att1"/>
</operator>
<operator activated="true" class="join" compatibility="5.1.003" expanded="true" height="76" name="Join" width="90" x="447" y="120"/>
<connect from_op="Generate Data" from_port="output" to_op="Generate ID (2)" to_port="example set input"/>
<connect from_op="Generate ID (2)" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Generate ID (2)" from_port="original" to_op="Generate ID" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Join" to_port="left"/>
<connect from_op="Generate ID" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
<connect from_op="Select Attributes (2)" from_port="example set output" to_op="Join" to_port="right"/>
<connect from_op="Join" from_port="join" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Greetings,
Sebastian0 -
Thanks for reply.
In my case, the id columns used for joining the two data are present in both files and I don't generate them. The data
My code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.003" expanded="true" name="Process">
<process expanded="true" height="639" width="805">
<operator activated="true" class="read_excel" compatibility="5.1.003" expanded="true" height="60" name="Read Excel" width="90" x="45" y="75">
<parameter key="excel_file" value="D:\Work\lenses_part1.xls"/>
<parameter key="imported_cell_range" value="a2:d25"/>
<list key="annotations"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="id.true.real.id"/>
<parameter key="1" value="col1.true.real.regular"/>
<parameter key="2" value="col2.true.real.regular"/>
<parameter key="3" value="col3.true.real.regular"/>
</list>
</operator>
<operator activated="true" class="read_excel" compatibility="5.1.003" expanded="true" height="60" name="Read Excel (2)" width="90" x="45" y="210">
<parameter key="excel_file" value="D:\Work\lenses_part2.xls"/>
<parameter key="imported_cell_range" value="A2:d25"/>
<list key="annotations"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="id.true.numeric.id"/>
<parameter key="1" value="col4.true.numeric.regular"/>
<parameter key="2" value="col5.true.numeric.regular"/>
<parameter key="3" value="col6.true.numeric.regular"/>
</list>
</operator>
<operator activated="true" class="join" compatibility="5.1.003" expanded="true" height="76" name="Join" width="90" x="246" y="120"/>
<connect from_op="Read Excel" from_port="output" to_op="Join" to_port="left"/>
<connect from_op="Read Excel (2)" from_port="output" to_op="Join" to_port="right"/>
<connect from_op="Join" from_port="join" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
The data in lenses_partX.xls are obtained from http://archive.ics.uci.edu/ml/machine-learning-databases/lenses/lenses.data, the first 4 columns are retained in lenses_part1.xls, the others are in lenses_part2.xls and the id is manually inserted into the xls file.
Thanks,
Lucian
0 -
Hello
Change this
<parameter key="1" value="col4.true.numeric.regular"/>
<parameter key="2" value="col5.true.numeric.regular"/>
<parameter key="3" value="col6.true.numeric.regular"/>
to this
<parameter key="a" value="col4.true.numeric.regular"/>
<parameter key="b" value="col5.true.numeric.regular"/>
<parameter key="c" value="col6.true.numeric.regular"/>
I think it's because the attribute names are not set; the import wizard default is to use names lke 1,2,3.
regards
Andrew0 -
Thanks for reply. At design time I got a warning:
"Error transforming meta data transformation: java.lang.NumberFormatException: For input string: "a" ".
Lucian0 -
Hello
Try the attached. I think it's because the range was A2:d25 which meant it was using row 2 as the names of the attributes so it failed on the initial read of the Excel files (the names of the Excel files will need to be changed for the setup you have).
regards
Andrew<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.004">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.004" expanded="true" name="Process">
<process expanded="true" height="639" width="805">
<operator activated="true" class="read_excel" compatibility="5.1.004" expanded="true" height="60" name="Read Excel (4)" width="90" x="45" y="120">
<parameter key="excel_file" value="D:\work\Book2.xls"/>
<parameter key="imported_cell_range" value="A1:C25"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="id.true.integer.id"/>
<parameter key="1" value="4.true.integer.attribute"/>
<parameter key="2" value="5.true.integer.attribute"/>
</list>
</operator>
<operator activated="true" class="read_excel" compatibility="5.1.004" expanded="true" height="60" name="Read Excel (3)" width="90" x="45" y="30">
<parameter key="excel_file" value="D:\work\Book1.xls"/>
<parameter key="imported_cell_range" value="A1:D25"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="id.true.integer.id"/>
<parameter key="1" value="1.true.integer.attribute"/>
<parameter key="2" value="2.true.integer.attribute"/>
<parameter key="3" value="3.true.integer.attribute"/>
</list>
</operator>
<operator activated="true" class="join" compatibility="5.1.004" expanded="true" height="76" name="Join" width="90" x="246" y="75"/>
<connect from_op="Read Excel (4)" from_port="output" to_op="Join" to_port="right"/>
<connect from_op="Read Excel (3)" from_port="output" to_op="Join" to_port="left"/>
<connect from_op="Join" from_port="join" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0 -
It did not work. However, after defining the names of the columns (I expected that checking "first row as names" brings the names of the columns) the join operation worked fine.
Thanks for your replies.
Lucian0