linear regression, confidence limit and forward selection

Hzu
Hzu New Altair Community Member
edited November 5 in Community Q&A
Hello together,

I'm just beginning with rapid miner on some (for me) well known data, and several questions occur to me.

1. Assume numerical data with one label, on which a regression is performed. Is there a way to give a confidence or prediction interval for each predicted label? And if so, how can I get it?

2.
I am applying forward selection with linear regression as learner two times to a (purely numerical) example set. First time with the inner operators "linear regression -> apply model -> performance" as inner operators for the forward selection, and then with a X-validation in the forward selection. Inside the X-validation I have again linear regression as learner and apply model + performance as testing. The results in the performance vectors are slightly different as I expected, and each "method" produces an example set. The example set coming from the 'X-Validation-branch' does not contain predicted values in contrast to the one coming from the forward selection only with linear regression, although I would say, that the output of both coustructs in the forward selection is the same.
I can craete predicted values through another model application + performance after the X-Validation inside the forward selection, but I fear that this changes the result of the forward selection. What woud be a proper way to get predicted values of both methods?

I would highly appreciate if someone could give me a hint. Thank's in advance.

P.S. additionally the XML code:
      </operator>
      <operator activated="true" class="normalize" expanded="true" height="94" name="Normalize" width="90" x="581" y="30"/>
      <operator activated="true" class="multiply" expanded="true" height="112" name="Multiply" width="90" x="45" y="300"/>
      <operator activated="true" class="optimize_selection_forward" expanded="true" height="94" name="Forward Selection (X-Val)" width="90" x="246" y="390">
        <parameter key="stopping_behavior" value="without significant increase"/>
        <parameter key="alpha" value="0.01"/>
        <process expanded="true" height="542" width="614">
          <operator activated="true" class="x_validation" expanded="true" height="112" name="X-Validation" width="90" x="112" y="30">
            <process expanded="true">
              <operator activated="true" class="linear_regression" expanded="true" height="76" name="Linear Regression (X-Val)" width="90" x="115" y="30"/>
              <connect from_port="training" to_op="Linear Regression (X-Val)" to_port="training set"/>
              <connect from_op="Linear Regression (X-Val)" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model (X-Val)" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance_regression" expanded="true" height="76" name="Performance (X-Val)" width="90" x="182" y="30">
                <parameter key="absolute_error" value="true"/>
                <parameter key="relative_error" value="true"/>
                <parameter key="relative_error_lenient" value="true"/>
                <parameter key="relative_error_strict" value="true"/>
                <parameter key="normalized_absolute_error" value="true"/>
                <parameter key="root_relative_squared_error" value="true"/>
                <parameter key="squared_error" value="true"/>
                <parameter key="correlation" value="true"/>
                <parameter key="squared_correlation" value="true"/>
                <parameter key="prediction_average" value="true"/>
                <parameter key="spearman_rho" value="true"/>
                <parameter key="kendall_tau" value="true"/>
              </operator>
              <connect from_port="model" to_op="Apply Model (X-Val)" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model (X-Val)" to_port="unlabelled data"/>
              <connect from_op="Apply Model (X-Val)" from_port="labelled data" to_op="Performance (X-Val)" to_port="labelled data"/>
              <connect from_op="Performance (X-Val)" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <connect from_port="example set" to_op="X-Validation" to_port="training"/>
          <connect from_op="X-Validation" from_port="averagable 1" to_port="performance"/>
          <portSpacing port="source_example set" spacing="0"/>
          <portSpacing port="sink_performance" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="optimize_selection_forward" expanded="true" height="94" name="Forward Selection (Regression)" width="90" x="246" y="255">
        <parameter key="stopping_behavior" value="without significant increase"/>
        <parameter key="alpha" value="0.01"/>
        <process expanded="true" height="542" width="614">
          <operator activated="true" class="linear_regression" expanded="true" height="76" name="Linear Regression" width="90" x="45" y="30"/>
          <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model (2)" width="90" x="179" y="30">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_regression" expanded="true" height="76" name="Performance (2)" width="90" x="313" y="30">
            <parameter key="absolute_error" value="true"/>
            <parameter key="relative_error" value="true"/>
            <parameter key="relative_error_lenient" value="true"/>
            <parameter key="relative_error_strict" value="true"/>
            <parameter key="normalized_absolute_error" value="true"/>
            <parameter key="root_relative_squared_error" value="true"/>
            <parameter key="squared_error" value="true"/>
            <parameter key="correlation" value="true"/>
            <parameter key="squared_correlation" value="true"/>
            <parameter key="prediction_average" value="true"/>
            <parameter key="spearman_rho" value="true"/>
            <parameter key="kendall_tau" value="true"/>
          </operator>
          <connect from_port="example set" to_op="Linear Regression" to_port="training set"/>
          <connect from_op="Linear Regression" from_port="model" to_op="Apply Model (2)" to_port="model"/>
          <connect from_op="Linear Regression" from_port="exampleSet" to_op="Apply Model (2)" to_port="unlabelled data"/>
          <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
          <connect from_op="Performance (2)" from_port="performance" to_port="performance"/>
          <portSpacing port="source_example set" spacing="0"/>
          <portSpacing port="sink_performance" spacing="0"/>
        </process>
      </operator>
      <operator activated="false" class="x_validation" expanded="true" height="112" name="Validation" width="90" x="45" y="120">
        <process expanded="true">
          <operator activated="false" class="linear_regression" expanded="true" height="76" name="Linear Regression (2)" width="90" x="115" y="30"/>
          <connect from_port="training" to_op="Linear Regression (2)" to_port="training set"/>
          <connect from_op="Linear Regression (2)" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="false" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
            <list key="application_parameters"/>
          </operator>
          <operator activated="false" class="performance_regression" expanded="true" height="76" name="Performance" width="90" x="179" y="30"/>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Read Excel" from_port="output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
      <connect from_op="Filter Examples" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
      <connect from_op="Replace Missing Values" from_port="example set output" to_op="Normalize" to_port="example set input"/>
      <connect from_op="Normalize" from_port="example set output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Forward Selection (Regression)" to_port="example set"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Forward Selection (X-Val)" to_port="example set"/>
      <connect from_op="Multiply" from_port="output 3" to_port="result 7"/>
      <connect from_op="Forward Selection (X-Val)" from_port="example set" to_port="result 4"/>
      <connect from_op="Forward Selection (X-Val)" from_port="attribute weights" to_port="result 5"/>
      <connect from_op="Forward Selection (X-Val)" from_port="performance" to_port="result 6"/>
      <connect from_op="Forward Selection (Regression)" from_port="example set" to_port="result 1"/>
      <connect from_op="Forward Selection (Regression)" from_port="attribute weights" to_port="result 2"/>
      <connect from_op="Forward Selection (Regression)" from_port="performance" to_port="result 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
      <portSpacing port="sink_result 5" spacing="0"/>
      <portSpacing port="sink_result 6" spacing="0"/>
      <portSpacing port="sink_result 7" spacing="0"/>
      <portSpacing port="sink_result 8" spacing="0"/>
    </process>
  </operator>

Answers

  • land
    land New Altair Community Member
    Hi,
    there's currently no method available for the confidence or prediction interval modeling. In fact I don't know any method that will deliver this information...Does anyone?

    To your second question:
    The proper way is to use a X-Validation to estimate the performance. Otherwise you will test on the training exampleset. If you are going to get predictions after the forward selection, you will have to learn a model again and apply it separately on the subset that's the result of the forward selection.

    Greetings,
      Sebastian