Hi. I was assuming that this would be straightforward thing to do. I have a dataset with surprisingly few missing values in just a few of the cases, I want to compute the missing values. There is an ID field in the data but no label. I set up the following process.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.014">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.014" expanded="true" name="Process">
<process expanded="true" height="550" width="748">
<operator activated="true" breakpoints="after" class="retrieve" compatibility="5.1.014" expanded="true" height="60" name="Retrieve" width="90" x="76" y="158">
<parameter key="repository_entry" value="c14 lcq for imputation short b"/>
</operator>
<operator activated="true" breakpoints="after" class="impute_missing_values" compatibility="5.1.014" expanded="true" height="60" name="Impute Missing Values" width="90" x="313" y="255">
<parameter key="value_type" value="numeric"/>
<process expanded="true" height="617" width="950">
<operator activated="true" breakpoints="after" class="linear_regression" compatibility="5.1.014" expanded="true" height="94" name="Linear Regression" width="90" x="444" y="270">
<parameter key="feature_selection" value="none"/>
</operator>
<connect from_port="example set source" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Linear Regression" from_port="model" to_port="model sink"/>
<portSpacing port="source_example set source" spacing="0"/>
<portSpacing port="sink_model sink" spacing="0"/>
</process>
</operator>
<operator activated="true" breakpoints="after" class="write_excel" compatibility="5.1.014" expanded="true" height="60" name="Write Excel" width="90" x="514" y="255">
<parameter key="excel_file" value="C:\Documents and Settings\ckolar\My Documents\data model\lcq\c14 missing values mputed.xls"/>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Impute Missing Values" to_port="example set in"/>
<connect from_op="Impute Missing Values" from_port="example set out" to_op="Write Excel" to_port="input"/>
<connect from_op="Write Excel" from_port="through" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
It appears to run, and when I run in debug mode it shows me the regression results for each of the 26 variables, but it appears to get to the end and throws me this error:
Dec 6, 2011 6:05:32 PM SEVERE: Process failed: operator cannot be executed. Check the log messages... Dec 6, 2011 6:05:32 PM SEVERE: Here: Process[1] (Process) subprocess 'Main Process' +- Retrieve[1] (Retrieve) +- Impute Missing Values[1] (Impute Missing Values) subprocess 'Replacement Learning' ==> | +- Linear Regression[26] (Linear Regression) +- Write Excel[0] (Write Excel) Dec 6, 2011 6:05:32 PM FINER: Parameter 'send_mail' is not set. Using default ('never'). Dec 6, 2011 6:05:32 PM SEVERE: java.lang.NullPointerException |
That's all I get in verbose mode. Any suggestions would be appreciated, this is my first time trying to impute missing values so much of this is a learning exercise for me. Thanks, CK