"The problem of building regression model from Rapidminer"
I tried the linear regression using the following data set,
The column "label" is the response variable, and other three columns are predictor variables. I built the Rapidminer workflow as
x y1 z1 label 0 85.2475654 245.1558442 99.69204152 -1 36.00008409 -50.37614679 95.61016949 -2 257.1300917 517.2790698 189 -2 194.4923912 10.50413223 593.6107784 1 602.6111798 410.6153846 345.1538462 1 36.2366869 608.7922078 1.076124567 -5 13.09949256 16.59633028 -4.389830508 -5 660.3381923 468.0886076 353.7486034 3 52.75862603 724.5955056 -20.92633223 -5 37.49788729 64.61607143 -2.71990172 |
<?xml version="1.0" encoding="UTF-8" standalone="no"?>The resulting model is not correct. On the other side, the R is able to build the linear regression model for this data set without any problem. i am not sure why Rapidminer has problem for this data set. Thanks.
<process version="5.2.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<process expanded="true" height="224" width="346">
<operator activated="true" class="read_csv" compatibility="5.2.008" expanded="true" height="60" name="Read CSV" width="90" x="59" y="95">
<parameter key="csv_file" value="C:\Users\Desktop\training.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="x.true.integer.attribute"/>
<parameter key="1" value="y1.true.real.attribute"/>
<parameter key="2" value="z1.true.real.attribute"/>
<parameter key="3" value="label.true.real.label"/>
</list>
</operator>
<operator activated="true" class="linear_regression" compatibility="5.2.008" expanded="true" height="94" name="Linear Regression" width="90" x="246" y="75"/>
<connect from_op="Read CSV" from_port="output" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Linear Regression" from_port="model" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0
Answers
-
You should get exactly the same if, in feature selection, you select "None". By default, Rapidminer implements the M5Prime Feature Selection. From what I understand this is sort of equivalent to maximizing the AIC.
0