"The problem of building regression model from Rapidminer"

Unknown
edited November 5 in Community Q&A
I tried the linear regression using the following data set,
x y1 z1 label
0 85.2475654 245.1558442 99.69204152
-1 36.00008409 -50.37614679 95.61016949
-2 257.1300917 517.2790698 189
-2 194.4923912 10.50413223 593.6107784
1 602.6111798 410.6153846 345.1538462
1 36.2366869 608.7922078 1.076124567
-5 13.09949256 16.59633028 -4.389830508
-5 660.3381923 468.0886076 353.7486034
3 52.75862603 724.5955056 -20.92633223
-5 37.49788729 64.61607143 -2.71990172
The column "label" is the response variable, and other three columns are predictor variables. I built the Rapidminer workflow as
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
    <process expanded="true" height="224" width="346">
      <operator activated="true" class="read_csv" compatibility="5.2.008" expanded="true" height="60" name="Read CSV" width="90" x="59" y="95">
        <parameter key="csv_file" value="C:\Users\Desktop\training.csv"/>
        <parameter key="column_separators" value=","/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <parameter key="encoding" value="windows-1252"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="x.true.integer.attribute"/>
          <parameter key="1" value="y1.true.real.attribute"/>
          <parameter key="2" value="z1.true.real.attribute"/>
          <parameter key="3" value="label.true.real.label"/>
        </list>
      </operator>
      <operator activated="true" class="linear_regression" compatibility="5.2.008" expanded="true" height="94" name="Linear Regression" width="90" x="246" y="75"/>
      <connect from_op="Read CSV" from_port="output" to_op="Linear Regression" to_port="training set"/>
      <connect from_op="Linear Regression" from_port="model" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
The resulting model is not correct.  On the other side, the R is able to build the linear regression model for this data set without any problem. i am not sure why Rapidminer has problem for this data set. Thanks.

Answers

  • earmijo
    earmijo New Altair Community Member
    You should get exactly the same if, in feature selection, you select "None". By default, Rapidminer implements the M5Prime Feature Selection. From what I understand this is sort of equivalent to maximizing the AIC.