I finally downloaded version 5. What made me do it? PMML.
I have to admit that I'm still in a transition stage. I like many new things in 5.0 (reporting, parallel processing, pmml) but I'm still used to the tree paradigm. I don't understand other things (why can't I define labels and ids in the ReadCSV operator as before,etc).
But I'm with version 5.0 from now on. I'm having some problems with the PMML operators. If I use examples generating data from inside RM, I get no errors. For instance:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="668" width="770">
<operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data" width="90" x="112" y="75">
<parameter key="target_function" value="polynomial"/>
</operator>
<operator activated="true" class="linear_regression" expanded="true" height="76" name="Linear Regression" width="90" x="319" y="75"/>
<operator activated="true" class="pmml:write_pmml" expanded="true" height="60" name="Write PMML" width="90" x="549" y="71">
<parameter key="file" value="c:\linreg.xml"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Linear Regression" from_port="model" to_op="Write PMML" to_port="model"/>
<connect from_op="Write PMML" from_port="model output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
But if I try to read a very simple CSV file. You can download it here
http://dl.dropbox.com/u/5477950/cerveza.csv using the following code, I ran into trouble:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="735" width="985">
<operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV" width="90" x="112" y="30">
<parameter key="file_name" value="c:\cerveza.csv"/>
</operator>
<operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="246" y="30">
<parameter key="name" value="cerveza"/>
<parameter key="target_role" value="label"/>
</operator>
<operator activated="true" class="linear_regression" expanded="true" height="76" name="Linear Regression" width="90" x="478" y="28"/>
<operator activated="true" class="pmml:write_pmml" expanded="true" height="60" name="Write PMML" width="90" x="685" y="28">
<parameter key="file" value="c:\linreg.xml"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Linear Regression" from_port="model" to_op="Write PMML" to_port="model"/>
<connect from_op="Write PMML" from_port="model output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
I get the error message: "The setup does not seem to contain any obvious errors, but you should check the logs..."
What am I doing wrong?
Another problem (even with RM generated data): When I try to run a logistic regression, I get an error indicating that the class MyKLRModel cannot be exported to PMML.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="735" width="985">
<operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data" width="90" x="85" y="48">
<parameter key="target_function" value="sum classification"/>
</operator>
<operator activated="true" class="logistic_regression" expanded="true" height="94" name="Logistic Regression" width="90" x="261" y="46"/>
<operator activated="true" class="pmml:write_pmml" expanded="true" height="60" name="Write PMML" width="90" x="479" y="45">
<parameter key="file" value="c:\logistic.xml"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Logistic Regression" to_port="training set"/>
<connect from_op="Logistic Regression" from_port="model" to_op="Write PMML" to_port="model"/>
<connect from_op="Write PMML" from_port="model output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
If I try the evolutionary version of LR, I get the same error.
Thanks in advance for any help,
\Ernesto