Join on 2 Joins produces Error
Piddie
New Altair Community Member
Hi,
each time I try to join examplesets which are the output of joins I get an "Error transforming meta data transformation: java.lang.NullPointerException". The process seems to run correctly, but while designing the process it's kind of irritating. Is this behaviour a result of a mistake I do, or is it a Bug?
Forgot to mention:I use version 5!
Greeting from snowy Lippe/Germany
Peter
each time I try to join examplesets which are the output of joins I get an "Error transforming meta data transformation: java.lang.NullPointerException". The process seems to run correctly, but while designing the process it's kind of irritating. Is this behaviour a result of a mistake I do, or is it a Bug?
Forgot to mention:I use version 5!
Greeting from snowy Lippe/Germany
Peter
Tagged:
0
Answers
-
Hi,
this seems to be a bug in the meta data transformations. Would you be so kind to post a example process? I will then take a look at that.
Greetings,
Sebastian0 -
Hi Sebastian,
what a fast reply!
I do not understand this XML-stuff but I think it's the better way to post the process. It does nothing but joining examplesets generated by queries on Microsoft Acess tables. The last join delivers the data I expect but the metadata is not generated:<?xml version="1.0" encoding="UTF-8" standalone="no"?>
Ouch, I definitly like the Process-view more than this.
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Root">
<parameter key="parallelize_main_process" value="true"/>
<process expanded="true" height="611" width="748">
<operator activated="true" class="read_database" expanded="true" height="60" name="Input Anst alle t" width="90" x="45" y="30">
<parameter key="define_connection" value="url"/>
<parameter key="database_system" value="ODBC Bridge (e.g. Access)"/>
<parameter key="database_url" value="jdbc:odbc:Driver={Microsoft Access-Treiber (*.mdb)};DBQ=c:\users\Papa\rapid-i\rethw08\re35118a.mdb"/>
<parameter key="username" value="Admin"/>
<parameter key="password" value="HqBOhGwNZxQ="/>
<parameter key="query" value="SELECT lfdnr, t, Ansprache as Anst FROM "ret35a" order by lfdnr,t "/>
</operator>
<operator activated="true" class="set_role" expanded="true" height="76" name="Set Role lfdnr id" width="90" x="179" y="30">
<parameter key="name" value="lfdnr"/>
<parameter key="target_role" value="id"/>
</operator>
<operator activated="true" class="read_database" expanded="true" height="60" name="Input Ansprache Schluß" width="90" x="45" y="165">
<parameter key="define_connection" value="url"/>
<parameter key="database_system" value="ODBC Bridge (e.g. Access)"/>
<parameter key="database_url" value="jdbc:odbc:Driver={Microsoft Access-Treiber (*.mdb)};DBQ=c:\users\Papa\rapid-i\rethw08\re35118a.mdb"/>
<parameter key="username" value="Admin"/>
<parameter key="password" value="HqBOhGwNZxQ="/>
<parameter key="query" value="SELECT lfdnr, Ansprache FROM "ret35a" Where T=248 "/>
</operator>
<operator activated="true" class="set_role" expanded="true" height="76" name="Set Role lfdnr id 1" width="90" x="179" y="165">
<parameter key="name" value="lfdnr"/>
<parameter key="target_role" value="id"/>
</operator>
<operator activated="true" class="join" expanded="true" height="76" name="Join" width="90" x="313" y="75"/>
<operator activated="true" class="read_database" expanded="true" height="60" name="Input Promotions" width="90" x="45" y="345">
<parameter key="define_connection" value="url"/>
<parameter key="database_system" value="ODBC Bridge (e.g. Access)"/>
<parameter key="database_url" value="jdbc:odbc:Driver={Microsoft Access-Treiber (*.mdb)};DBQ=c:\users\Papa\rapid-i\art118.mdb"/>
<parameter key="username" value="Admin"/>
<parameter key="password" value="HqBOhGwNZxQ="/>
<parameter key="query" value="SELECT * FROM "Promotions" WHERE ka='35'"/>
</operator>
<operator activated="true" class="set_role" expanded="true" height="76" name="Set Role orig id" width="90" x="246" y="345">
<parameter key="name" value="lfdnr"/>
<parameter key="target_role" value="id"/>
</operator>
<operator activated="true" class="join" expanded="true" height="76" name="Join (2)" width="90" x="514" y="165"/>
<connect from_op="Input Anst alle t" from_port="output" to_op="Set Role lfdnr id" to_port="example set input"/>
<connect from_op="Set Role lfdnr id" from_port="example set output" to_op="Join" to_port="left"/>
<connect from_op="Input Ansprache Schluß" from_port="output" to_op="Set Role lfdnr id 1" to_port="example set input"/>
<connect from_op="Set Role lfdnr id 1" from_port="example set output" to_op="Join" to_port="right"/>
<connect from_op="Join" from_port="join" to_op="Join (2)" to_port="left"/>
<connect from_op="Input Promotions" from_port="output" to_op="Set Role orig id" to_port="example set input"/>
<connect from_op="Set Role orig id" from_port="example set output" to_op="Join (2)" to_port="right"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
</process>
Hope you can read more than I do out of this code.
Thanks for your patience with noobs like me, for your great software (which I can't handle yet) and for the fantastic Webinar you held in Dec last year.
Best regards
Peter0 -
Hi Peter,
if you think, I'm answering quickly, try our enterprise support, it'll show you, what's really quick
I have loaded your process and exchanged your access reader by data generators (Since I didn't have your files). Now it looks like below and works just fine. Please try to update to the newest RapidMiner version 5.0.003. If the error occurs again, please tell me.<?xml version="1.0" encoding="UTF-8" standalone="no"?>
Greetings,
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Root">
<process expanded="true" height="611" width="748">
<operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
<operator activated="true" class="generate_id" expanded="true" height="76" name="Generate ID" width="90" x="179" y="30"/>
<operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data (2)" width="90" x="45" y="165"/>
<operator activated="true" class="generate_id" expanded="true" height="76" name="Generate ID (2)" width="90" x="168" y="177"/>
<operator activated="true" class="join" expanded="true" height="76" name="Join" width="90" x="313" y="75"/>
<operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data (3)" width="90" x="45" y="345"/>
<operator activated="true" class="generate_id" expanded="true" height="76" name="Generate ID (3)" width="90" x="201" y="343"/>
<operator activated="true" class="join" expanded="true" height="76" name="Join (2)" width="90" x="514" y="165"/>
<connect from_op="Generate Data" from_port="output" to_op="Generate ID" to_port="example set input"/>
<connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="left"/>
<connect from_op="Generate Data (2)" from_port="output" to_op="Generate ID (2)" to_port="example set input"/>
<connect from_op="Generate ID (2)" from_port="example set output" to_op="Join" to_port="right"/>
<connect from_op="Join" from_port="join" to_op="Join (2)" to_port="left"/>
<connect from_op="Generate Data (3)" from_port="output" to_op="Generate ID (3)" to_port="example set input"/>
<connect from_op="Generate ID (3)" from_port="example set output" to_op="Join (2)" to_port="right"/>
<connect from_op="Join (2)" from_port="join" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Sebastian0 -
Hi Sebastian,
I'm just trying out RapidMiner at home but if I manage to get good results on my data - which is real data from my job - it might end in business-relationship to Rapid-I and then I will test how quick quick can be
I updated to the newest version - but nothing changed.
So I shrinked my data from >=1000 examples (Ifear even more than 10000) in each exampleset (outputs of joins too) to something between 24 and 264 (some this amount, others that).
Now my process dosn't through any error! ;D
It seems that metadata is not generated if there is too much data.
Maybe you can verify this guess?
Thanks in advance
Peter
0 -
Hi,
the number of examples should be completely irrelevant to the meta data transformation. Where does the error occur? Is it shown in the Problems tab or does it occur during process execution?
Unfortunately I cannot reproduce the problem here with the generate sets from the process above. Do you have nominal values in your data set?
Greetings,
Sebastian0 -
Hi,
The error is shown in the Problems tab - during process execution there is no error (and results seem to be ok)!
My data set consists of nominal and numerical values, some attributes with missing values (i.e. null).
If the number of examples is irrelevant, it must be the values of specific examples. I will try to identify these examples next weekend and will inform you afterwards.
Greetings
Peter
0 -
I have the same problem, It might be because of differently named ID columns. The join still works however it is quite annoying that posts an error and propagates many error messages down the process path as a result.
Anyone joining 2 data sets with differently named ID columns?
-Gagi0 -
Hi again,
this is indeed annoying. I do my best to find this bug, but I still cannot reproduce it. You are all using the final version 5.0.003, I guess?
This process does work on my side, does it make problems if you load it?<?xml version="1.0" encoding="UTF-8" standalone="no"?>
Greetings,
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="646" width="714">
<operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data" width="90" x="45" y="75">
<parameter key="target_function" value="spiral cluster"/>
<parameter key="number_of_attributes" value="2"/>
</operator>
<operator activated="true" class="generate_id" expanded="true" height="76" name="Generate ID" width="90" x="216" y="85"/>
<operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data (2)" width="90" x="45" y="210">
<parameter key="target_function" value="spiral cluster"/>
<parameter key="number_of_attributes" value="2"/>
</operator>
<operator activated="true" class="generate_id" expanded="true" height="76" name="Generate ID (2)" width="90" x="213" y="270"/>
<operator activated="true" class="rename" expanded="true" height="76" name="Rename" width="90" x="380" y="300">
<parameter key="old_name" value="id"/>
<parameter key="new_name" value="i5"/>
</operator>
<operator activated="true" class="join" expanded="true" height="76" name="Join" width="90" x="514" y="165"/>
<connect from_op="Generate Data" from_port="output" to_op="Generate ID" to_port="example set input"/>
<connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="left"/>
<connect from_op="Generate Data (2)" from_port="output" to_op="Generate ID (2)" to_port="example set input"/>
<connect from_op="Generate ID (2)" from_port="example set output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Join" to_port="right"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
</process>
Sebastian0 -
Hi,
yes now I use 5.0.003 but nothing changed. So tested a bit.
I think, I found some reasons for this behaviour in metadata:
1. In Joins the amount of examples in the resulting exampleset is determined by the right part as far as Metadata is concerned.
2. Read from ODBC or Excel only shows the exact number of example, while it's less than 1000 (otherwise metadata says >=1000 examples).
3. Metadata with the information >=1000 examples seems to be faulty.
4. Changing The join from inner to left or right changes nothing in metadata.
To prove this, I modyfied Sebastian's process:<?xml version="1.0" encoding="UTF-8" standalone="no"?>
My Excel inputfile is just filled in the first column with numbers 1 to 1000 (First row is name of Attribute= Lfdnr).
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="514" width="719">
<operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data" width="90" x="45" y="75">
<parameter key="target_function" value="spiral cluster"/>
<parameter key="number_of_attributes" value="2"/>
</operator>
<operator activated="true" class="generate_id" expanded="true" height="76" name="Generate ID" width="90" x="216" y="85"/>
<operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data (2)" width="90" x="45" y="210">
<parameter key="target_function" value="spiral cluster"/>
<parameter key="number_of_attributes" value="2"/>
</operator>
<operator activated="true" class="generate_id" expanded="true" height="76" name="Generate ID (2)" width="90" x="179" y="210"/>
<operator activated="true" class="rename" expanded="true" height="76" name="Rename" width="90" x="313" y="210">
<parameter key="old_name" value="id"/>
<parameter key="new_name" value="i5"/>
</operator>
<operator activated="true" class="read_excel" expanded="true" height="60" name="Read Excel" width="90" x="30" y="363">
<parameter key="excel_file" value="C:\Users\Papa\Documents\rm_workspace\Meine Daten\test.xls"/>
</operator>
<operator activated="true" class="generate_id" expanded="true" height="76" name="Generate ID (3)" width="90" x="179" y="345"/>
<operator activated="true" class="rename" expanded="true" height="76" name="Rename (2)" width="90" x="380" y="345">
<parameter key="old_name" value="id"/>
<parameter key="new_name" value="i6"/>
</operator>
<operator activated="true" class="join" expanded="true" height="76" name="Join (2)" width="90" x="581" y="255"/>
<operator activated="true" class="join" expanded="true" height="76" name="Join" width="90" x="514" y="120"/>
<connect from_op="Generate Data" from_port="output" to_op="Generate ID" to_port="example set input"/>
<connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="left"/>
<connect from_op="Generate Data (2)" from_port="output" to_op="Generate ID (2)" to_port="example set input"/>
<connect from_op="Generate ID (2)" from_port="example set output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Join (2)" to_port="left"/>
<connect from_op="Read Excel" from_port="output" to_op="Generate ID (3)" to_port="example set input"/>
<connect from_op="Generate ID (3)" from_port="example set output" to_op="Rename (2)" to_port="example set input"/>
<connect from_op="Rename (2)" from_port="example set output" to_op="Join (2)" to_port="right"/>
<connect from_op="Join (2)" from_port="join" to_op="Join" to_port="right"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
</process>
Btw., is there a chance to get read fom odf-files (Open office)?
I hope, now you can reproduce the bug.
Greetings
Peter
0 -
Hi Peter,
I finally found it! Thanks a lot. The bug does not occur anyway, but the meta data is deleted anyway. Don't know if this is the best one could do there, we will check that.
Greetings,
Sebastian0 -
I'm completely new to RapidMiner, please bear with my naiveness. Is this the same bug I encounter when I have:
Two example data sets (differing attributes, same ID column) connected both (multiply) with i) a Set Minus Operator and ii) an inner join, finally connected to an union operator? It also throws a null pointer when "Set Minus" is empty. This is when the first example set completely matches the second.However, it does not, if both sets differ by one entry. Is the union operator intolerable to this case, is it this bug or didn't I find the right operator?
Cheers - Oliver0 -
Hi,
this seems to be a bug. If you would design a small example process using Generate Data operators as data sources and post it here inside this #-button's code area? It would make my life easier fixing this issue.
Greetings,
Sebastian0 -
Hi, sorry I was too new to find it out at first hand. It does not work with the DataGenerator, however I have generated two random Datasets which will cause the error.
Is this helpful to you?
Greetings,
Oliver
http://www.mediafire.com/file/mwiiknkuv14/DataGenA.xls
http://www.mediafire.com/file/umozx3omjqn/DataGenB.xls<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="767" width="902">
<operator activated="true" class="read_excel" expanded="true" height="60" name="Read Excel" width="90" x="45" y="165">
<parameter key="excel_file" value="C:\Users\XXX\Documents\Daten\DataGenA.xls"/>
<parameter key="row_offset" value="1"/>
</operator>
<operator activated="true" class="read_excel" expanded="true" height="60" name="Read Excel (2)" width="90" x="45" y="255">
<parameter key="excel_file" value="C:\Users\XXX\Documents\Daten\DataGenB.xls"/>
</operator>
<operator activated="true" class="generate_attributes" expanded="true" height="76" name="Generate Attributes" width="90" x="179" y="165">
<list key="function_descriptions">
<parameter key="Origin" value=""A""/>
</list>
</operator>
<operator activated="true" class="rename" expanded="true" height="76" name="Rename" width="90" x="313" y="165">
<parameter key="old_name" value="IDGen_ID"/>
<parameter key="new_name" value="GenID"/>
</operator>
<operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="447" y="165">
<parameter key="name" value="GenID"/>
<parameter key="target_role" value="id"/>
</operator>
<operator activated="true" class="generate_attributes" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="179" y="255">
<list key="function_descriptions">
<parameter key="Origin" value=""B""/>
</list>
</operator>
<operator activated="true" class="rename" expanded="true" height="76" name="Rename (2)" width="90" x="313" y="255">
<parameter key="old_name" value="IDGen_ID"/>
<parameter key="new_name" value="GenID"/>
</operator>
<operator activated="true" class="set_role" expanded="true" height="76" name="Set Role (2)" width="90" x="447" y="255">
<parameter key="name" value="GenID"/>
<parameter key="target_role" value="id"/>
</operator>
<operator activated="true" class="subprocess" expanded="true" height="112" name="Merge" width="90" x="581" y="165">
<process expanded="true" height="767" width="902">
<operator activated="true" class="multiply" expanded="true" height="112" name="Multiply (3)" width="90" x="45" y="30"/>
<operator activated="true" class="multiply" expanded="true" height="112" name="Multiply (4)" width="90" x="45" y="165"/>
<operator activated="true" class="set_minus" expanded="true" height="76" name="Set Minus (3)" width="90" x="179" y="210"/>
<operator activated="true" class="multiply" expanded="true" height="94" name="Multiply (26)" width="90" x="313" y="255"/>
<operator activated="true" class="join" expanded="true" height="76" name="Join (2)" width="90" x="179" y="120">
<parameter key="remove_double_attributes" value="false"/>
</operator>
<operator activated="true" class="set_minus" expanded="true" height="76" name="Set Minus" width="90" x="179" y="30"/>
<operator activated="true" class="multiply" expanded="true" height="94" name="Multiply" width="90" x="313" y="30"/>
<operator activated="true" class="union" expanded="true" height="76" name="Union" width="90" x="447" y="75"/>
<operator activated="true" class="union" expanded="true" height="76" name="Union (3)" width="90" x="581" y="120"/>
<connect from_port="in 1" to_op="Multiply (3)" to_port="input"/>
<connect from_port="in 2" to_op="Multiply (4)" to_port="input"/>
<connect from_op="Multiply (3)" from_port="output 1" to_op="Set Minus" to_port="example set input"/>
<connect from_op="Multiply (3)" from_port="output 2" to_op="Join (2)" to_port="left"/>
<connect from_op="Multiply (3)" from_port="output 3" to_op="Set Minus (3)" to_port="subtrahend"/>
<connect from_op="Multiply (4)" from_port="output 1" to_op="Set Minus" to_port="subtrahend"/>
<connect from_op="Multiply (4)" from_port="output 2" to_op="Join (2)" to_port="right"/>
<connect from_op="Multiply (4)" from_port="output 3" to_op="Set Minus (3)" to_port="example set input"/>
<connect from_op="Set Minus (3)" from_port="example set output" to_op="Multiply (26)" to_port="input"/>
<connect from_op="Multiply (26)" from_port="output 1" to_port="out 3"/>
<connect from_op="Multiply (26)" from_port="output 2" to_op="Union (3)" to_port="example set 2"/>
<connect from_op="Join (2)" from_port="join" to_op="Union" to_port="example set 2"/>
<connect from_op="Set Minus" from_port="example set output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_port="out 2"/>
<connect from_op="Multiply" from_port="output 2" to_op="Union" to_port="example set 1"/>
<connect from_op="Union" from_port="union" to_op="Union (3)" to_port="example set 1"/>
<connect from_op="Union (3)" from_port="union" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="source_in 3" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
<portSpacing port="sink_out 3" spacing="0"/>
<portSpacing port="sink_out 4" spacing="0"/>
</process>
</operator>
<operator activated="true" class="collect" expanded="true" height="94" name="MergeDiff" width="90" x="581" y="30"/>
<connect from_op="Read Excel" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Read Excel (2)" from_port="output" to_op="Generate Attributes (2)" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Merge" to_port="in 1"/>
<connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Rename (2)" to_port="example set input"/>
<connect from_op="Rename (2)" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
<connect from_op="Set Role (2)" from_port="example set output" to_op="Merge" to_port="in 2"/>
<connect from_op="Merge" from_port="out 1" to_port="result 2"/>
<connect from_op="Merge" from_port="out 2" to_op="MergeDiff" to_port="input 1"/>
<connect from_op="Merge" from_port="out 3" to_op="MergeDiff" to_port="input 2"/>
<connect from_op="MergeDiff" from_port="collection" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>0 -
Hi,
the process with your data does not cause any error at my RapidMiner, so I guess the bug has been solved.
Greetings,
Sebastian0 -
Hi Sebastian
I was too fast, I guess you did not distribute the new version yet :-) I still got the null.pointer exception. I will test again when you distribute the new version. Keep up the good work!
Greetings Oliver0