cross validation result workspace [RM5]

TheBear
TheBear New Altair Community Member
edited November 5 in Community Q&A
Hi guys,
thanks for the great work. I really like the new user-friendly interface. Its really a great step forward for non experts in data mining to use your tool.

I am trying to train two neural networks (hopefully the standard neural net  is backpropagation and RBF from weka). I tried to adopt the online tutorial / video for measuring performance using cross validation.
I have several questions to that output which is presented in the result workspace.

1.
Although I only have two X-validation processes I have 3 tabs PerformanceVector results. And I am not sure why it is 3 and not 2. Can you help me with that? I am using only the gui for the process building and was thinking I created two exact copies of the cross validation processes where I only substituted the learner.

2. The performance Vector tab says
"root_mean_squared_error: 5.810 +/- 1.100 (mikro: 5.905 +/- 0.000)"
Which I think is pretty high for the prediction error.... anyway what does the does the information in the brackets mean? (Sorry I am not an expert in learning machines and not familiar with the notation)



In the log file it appears a strange warning: "Feb 28, 2010 7:54:11 PM WARNING: Caught exception in concurrent execution of Perf RBF (inner) (Performance (Regression)): com.rapidminer.operator.UserError: Input example set does not have a predicted label attribute"
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input>
      <location/>
    </input>
    <output>
      <location/>
      <location/>
      <location/>
      <location/>
      <location/>
    </output>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Process">
    <process expanded="true" height="836" width="1304">
      <operator activated="true" class="read_excel" expanded="true" height="60" name="Read Excel" width="90" x="45" y="75">
        <parameter key="excel_file" value="C:\Users\Seb\Documents\VersuchsplanVoids2007.xls"/>
        <parameter key="sheet_number" value="3"/>
      </operator>
      <operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes" width="90" x="112" y="165">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="SolderPasteA|SolderPasteC|SolderPasteD|Wetting paste height|Wetting inner area mean|Wetting outer area|Alloy compound405|BGA Void mean|Soldering_Con|Soldering_O2|Soldering_Vac|Soldering_Vap"/>
      </operator>
      <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="179" y="300">
        <parameter key="name" value="BGA Void mean"/>
        <parameter key="target_role" value="label"/>
      </operator>
      <operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Examples" width="90" x="313" y="435">
        <parameter key="condition_class" value="missing_labels"/>
        <parameter key="invert_filter" value="true"/>
      </operator>
      <operator activated="true" class="multiply" expanded="true" height="94" name="Multiply" width="90" x="447" y="480"/>
      <operator activated="true" class="x_validation" expanded="true" height="112" name="X-BPNN" width="90" x="715" y="615">
        <parameter key="average_performances_only" value="false"/>
        <parameter key="sampling_type" value="shuffled sampling"/>
        <parameter key="parallelize_training" value="true"/>
        <parameter key="parallelize_testing" value="true"/>
        <process expanded="true" height="559" width="487">
          <operator activated="true" class="replace_missing_values" expanded="true" height="94" name="Replace Missing Values" width="90" x="76" y="185">
            <parameter key="include_special_attributes" value="true"/>
            <parameter key="default" value="none"/>
            <list key="columns"/>
          </operator>
          <operator activated="true" class="neural_net" expanded="true" height="76" name="Neural Net" width="90" x="246" y="165">
            <list key="hidden_layers">
              <parameter key="H1" value="15"/>
              <parameter key="H2" value="5"/>
            </list>
            <parameter key="learning_rate" value="0.2"/>
            <parameter key="momentum" value="0.4"/>
          </operator>
          <connect from_port="training" to_op="Replace Missing Values" to_port="example set input"/>
          <connect from_op="Replace Missing Values" from_port="example set output" to_op="Neural Net" to_port="training set"/>
          <connect from_op="Neural Net" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true" height="559" width="487">
          <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="112" y="75">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_regression" expanded="true" height="76" name="Perf BP (inner)" width="90" x="313" y="30">
            <parameter key="main_criterion" value="root_mean_squared_error"/>
            <parameter key="absolute_error" value="true"/>
            <parameter key="correlation" value="true"/>
            <parameter key="use_example_weights" value="false"/>
          </operator>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Perf BP (inner)" to_port="labelled data"/>
          <connect from_op="Perf BP (inner)" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="x_validation" expanded="true" height="112" name="X-RBF" width="90" x="1050" y="300">
        <parameter key="average_performances_only" value="false"/>
        <parameter key="sampling_type" value="shuffled sampling"/>
        <parameter key="parallelize_training" value="true"/>
        <parameter key="parallelize_testing" value="true"/>
        <process expanded="true" height="559" width="487">
          <operator activated="true" class="replace_missing_values" expanded="true" height="94" name="Replace Missing Values (2)" width="90" x="45" y="30">
            <parameter key="include_special_attributes" value="true"/>
            <parameter key="default" value="none"/>
            <list key="columns"/>
          </operator>
          <operator activated="true" class="weka:W-RBFNetwork" expanded="true" height="76" name="W-RBFNetwork" width="90" x="246" y="30">
            <parameter key="B" value="15.0"/>
            <parameter key="W" value="0.3"/>
          </operator>
          <connect from_port="training" to_op="Replace Missing Values (2)" to_port="example set input"/>
          <connect from_op="Replace Missing Values (2)" from_port="example set output" to_op="W-RBFNetwork" to_port="training set"/>
          <connect from_op="W-RBFNetwork" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true" height="559" width="487">
          <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model (2)" width="90" x="45" y="30">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_regression" expanded="true" height="76" name="Perf RBF (inner)" width="90" x="253" y="30">
            <parameter key="main_criterion" value="root_mean_squared_error"/>
            <parameter key="absolute_error" value="true"/>
            <parameter key="correlation" value="true"/>
            <parameter key="use_example_weights" value="false"/>
          </operator>
          <connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
          <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Perf RBF (inner)" to_port="labelled data"/>
          <connect from_op="Perf RBF (inner)" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Read Excel" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
      <connect from_op="Filter Examples" from_port="example set output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="X-RBF" to_port="training"/>
      <connect from_op="Multiply" from_port="output 2" to_op="X-BPNN" to_port="training"/>
      <connect from_op="X-BPNN" from_port="model" to_port="result 3"/>
      <connect from_op="X-BPNN" from_port="averagable 1" to_port="result 4"/>
      <connect from_op="X-RBF" from_port="model" to_port="result 1"/>
      <connect from_op="X-RBF" from_port="averagable 1" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="270"/>
      <portSpacing port="sink_result 2" spacing="18"/>
      <portSpacing port="sink_result 3" spacing="108"/>
      <portSpacing port="sink_result 4" spacing="0"/>
      <portSpacing port="sink_result 5" spacing="0"/>
    </process>
  </operator>
</process>

Answers

  • land
    land New Altair Community Member
    Hi,
    thank you for this kind words.

    The first problem does not occur on my side. I receive two PerformanceVectors, each of them containing three performance criterions.

    To your second question: I don't know if a value of 5.8 is high or not? Depends very much on the scale of the label attribute, does it?
    We distinguish between the macro and mikro average and variance. The first is the result, if the performance vectors of each XValidation fold are averaged with the same weight. The mikro average takes the number of examples into account, that was used to build the performance vector delivered to the XValidation: Folds with a higher number of examples receive increased weight. I hope this clarifies this a bit?

    The log file entry does not appear on my side. Sorry for that

    Greetings,
      Sebastian
  • TheBear
    TheBear New Altair Community Member
    It looks like that
    image
    Maybe it has something to do with the feature of disabling old result tabs. Although I think I set the default to always delete old results.
    I ll try again the process when I restart rapidminer next time ... :)
  • land
    land New Altair Community Member
    Hi,
    this result seems to remain there from a debug point since it's the direct result from the inner performance evaluator?

    Greetings,
      Sebastian