"Problem reading a simple dataset from Excel"
darkobodnaruk
New Altair Community Member
Hi,
a very first problem. Might be a silly one, but here goes:
I loaded a trivial XML file (read data from Excel and use ID3 on it). RapidMiner converted and automatically wired it. But ExcelReader gives me "Cannot create example set meta data. Process stopped in ExcelReader."
The Excel (2003) file is the famous lenses dataset: 5 columns, 25 row, first row are column names, 5h column is the label.
In ExcelReader I ticked all the right checkboxes (I think).
What else could be the problem?
regards,
darko
ps. the XML file:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator class="Process" expanded="true" name="Root">
<process expanded="true" height="251" width="614">
<operator class="ExcelReader" expanded="true" height="60" name="ExcelReader" width="90" x="45" y="30">
<parameter key="excel_file" value="D:\dropbox\My Dropbox\magisterij\delo\data\lenses.xls"/>
<parameter key="first_row_as_names" value="true"/>
<parameter key="create_label" value="true"/>
<parameter key="label_column" value="4"/>
<parameter key="decimal_point_character" value=","/>
</operator>
<operator class="ID3" expanded="true" height="76" name="ID3" width="90" x="447" y="30"/>
<connect from_op="ExcelReader" from_port="output" to_op="ID3" to_port="training set"/>
<connect from_op="ID3" from_port="model" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
a very first problem. Might be a silly one, but here goes:
I loaded a trivial XML file (read data from Excel and use ID3 on it). RapidMiner converted and automatically wired it. But ExcelReader gives me "Cannot create example set meta data. Process stopped in ExcelReader."
The Excel (2003) file is the famous lenses dataset: 5 columns, 25 row, first row are column names, 5h column is the label.
In ExcelReader I ticked all the right checkboxes (I think).
What else could be the problem?
regards,
darko
ps. the XML file:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator class="Process" expanded="true" name="Root">
<process expanded="true" height="251" width="614">
<operator class="ExcelReader" expanded="true" height="60" name="ExcelReader" width="90" x="45" y="30">
<parameter key="excel_file" value="D:\dropbox\My Dropbox\magisterij\delo\data\lenses.xls"/>
<parameter key="first_row_as_names" value="true"/>
<parameter key="create_label" value="true"/>
<parameter key="label_column" value="4"/>
<parameter key="decimal_point_character" value=","/>
</operator>
<operator class="ID3" expanded="true" height="76" name="ID3" width="90" x="447" y="30"/>
<connect from_op="ExcelReader" from_port="output" to_op="ID3" to_port="training set"/>
<connect from_op="ID3" from_port="model" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
0
Answers
-
Sorry about that. I figured out it's not a critical problem, I can get a model.
Still interested what it means though...
regards,
darko0 -
Hi, thank you for reporting this.
If I understand you right, this is only a meta data error, but the process itself is running correctly, right?
This is a known issue. The reason is a bit techincal. For the "normal" example set readers, we do not have the meta data available. The only way to get it is to try to read the file and generate meta data from inspecting it. This can take as long as loading the file, so it is a feature which also harms performance during design time. The excel reader does not play well with this misuse of the reading method and interrupts itself, hence the error message.
This "feature" was ment to provide a bit of meta data information while we did not have the repository and hence there was no other way of getting at the meta data. In one of the next releases we will disable this feature entirely because it is not relyable. The preferred way is to read the data once and use a RepositoryStorer to place the file into your repository. All data read from the repository hereafter will have correct and efficient meta data annotations.
I filed this as Bug 91.
Best,
Simon0