model from inside of "Optimize parameters" proces

gutompf
gutompf New Altair Community Member
edited November 5 in Community Q&A
Hi.
I constructed model in which I: 1. optimize parameters on the basis of training data 2. By "Set parameters" I send optimal parameters to optimal learner (SVM) 3. I trained this SVM with this parameters on the basis of the training data 3.aafter that I used "Apply model" with test data and I get results.
I think this is basic way how to do this.

My question is: Inside of "Optimize parameters" model was built only with some partition of training data, because I used Crossvalidation and some (originally training) data was used for testing. And I want to get this model based on exactly this data and not model build with optimal parameters but on all training data. Is this possible? Isn't somewhere video or text about this? I found only tutorials where is some parts of this proces.

I hope my english is clear enough, and will be happy for answer
Milan
Tagged:

Answers

  • MariusHelf
    MariusHelf New Altair Community Member
    I am not sure if I get you right, but I think what you are trying to do is not possible or at least has low payoff. Where do you have the X-Validation? Is it inside the Paremter Optimization? Then For each parameter setting 10 (if you have a 10-fold X-Validation) models are created, evaulated, and the average performance is used by the parameter optimization to estimate the performance of the current parameter set. Thus, there is not "the" model for this parameter combination and performance.

    I hope I could help you. If something is still unclear please post a minimal example process.

    Cheers, Marius
  • MariusHelf
    MariusHelf New Altair Community Member
    If I get it right, the code you posted is not one process, but several, of which I can't load any. Anyway, for improving readability you should use the "Insert Code" function of this forum.

    As I can't load your processes, I can only guess: did you check, that the parameters of the SVM operator are set correctly, i.e. comply with the output of the Optimization operator? If not, check if you have written the operator names in the set parameters operator correctly. Remember, that "set operator name" must be the name of the svm operator inside the optimization, and "operator name" the one you want to copy the parameters to.
    Additionally, by default the main criterion of the Performance operators is accuracy for classification tasks. If you want to optimize another criterion, you have to set the main criterion of the performance operator inside the optimization correctly. If you can't find the correct criterion, remember that there is more than one performance operator available in RapidMiner - choose the correct one for your problem.

    Cheers,
    Marius
  • gutompf
    gutompf New Altair Community Member
    Hi Marius.
    Thank you very much for patience with me. But I think that I have everything OK where you think I could have mistake. I am sending you my process again. Previously I copied it from Process window, now I am copying it from XML window. My problem is, that according to LOG process, which is writing parameters and correlation to the file different parameters seems to be optimal than thouse which are setting to SVM optimal learner. I am not able to find mistake.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.008">
      <context>
        <input/>
        <output>
          <location>result</location>
        </output>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.008" expanded="true" name="Process">
        <process expanded="true" height="460" width="681">
          <operator activated="true" class="read_csv" compatibility="5.1.008" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
            <parameter key="csv_file" value="C:\RapidM projekty\pedotransferky\data\tren_a_valid.txt"/>
            <parameter key="column_separators" value="&#9;"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <parameter key="encoding" value="windows-1250"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="I.true.real.attribute"/>
              <parameter key="1" value="II.true.real.attribute"/>
              <parameter key="2" value="III.true.real.attribute"/>
              <parameter key="3" value="IV.true.real.attribute"/>
              <parameter key="4" value="ro.true.real.attribute"/>
              <parameter key="5" value="f2\.50.false.real.label"/>
              <parameter key="6" value="f56\.00.false.real.label"/>
              <parameter key="7" value="f209\.00.true.real.label"/>
              <parameter key="8" value="f558\.00.false.real.label"/>
              <parameter key="9" value="f976\.00.false.real.label"/>
              <parameter key="10" value="f3060\.00.false.real.label"/>
              <parameter key="11" value="f15300\.00.false.real.label"/>
            </list>
          </operator>
          <operator activated="true" class="normalize" compatibility="5.1.008" expanded="true" height="94" name="Normalize" width="90" x="179" y="30">
            <parameter key="method" value="range transformation"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="5.1.008" expanded="true" height="94" name="Multiply" width="90" x="313" y="30"/>
          <operator activated="true" class="optimize_parameters_grid" compatibility="5.1.008" expanded="true" height="94" name="OP" width="90" x="447" y="30">
            <list key="parameters">
              <parameter key="SVM.C" value="[1;601;200;linear]"/>
              <parameter key="SVM.gamma" value="[0.1;1;20;linear]"/>
            </list>
            <process expanded="true" height="421" width="653">
              <operator activated="true" class="x_validation" compatibility="5.1.008" expanded="true" height="112" name="Validation" width="90" x="112" y="165">
                <parameter key="number_of_validations" value="5"/>
                <parameter key="sampling_type" value="linear sampling"/>
                <process expanded="true" height="441" width="278">
                  <operator activated="true" class="support_vector_machine_libsvm" compatibility="5.1.008" expanded="true" height="76" name="SVM" width="90" x="94" y="30">
                    <parameter key="svm_type" value="epsilon-SVR"/>
                    <parameter key="degree" value="2"/>
                    <parameter key="gamma" value="1.0"/>
                    <parameter key="coef0" value="50.0"/>
                    <parameter key="C" value="601.0"/>
                    <list key="class_weights"/>
                  </operator>
                  <connect from_port="training" to_op="SVM" to_port="training set"/>
                  <connect from_op="SVM" from_port="model" to_port="model"/>
                  <portSpacing port="source_training" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                </process>
                <process expanded="true" height="441" width="278">
                  <operator activated="true" class="apply_model" compatibility="5.1.008" expanded="true" height="76" name="Apply Model (2)" width="90" x="45" y="30">
                    <list key="application_parameters"/>
                  </operator>
                  <operator activated="true" class="performance_regression" compatibility="5.1.008" expanded="true" height="76" name="Performance (2)" width="90" x="161" y="30">
                    <parameter key="main_criterion" value="correlation"/>
                    <parameter key="root_mean_squared_error" value="false"/>
                    <parameter key="correlation" value="true"/>
                  </operator>
                  <connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
                  <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
                  <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
                  <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
                  <portSpacing port="source_model" spacing="0"/>
                  <portSpacing port="source_test set" spacing="0"/>
                  <portSpacing port="source_through 1" spacing="0"/>
                  <portSpacing port="sink_averagable 1" spacing="0"/>
                  <portSpacing port="sink_averagable 2" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="log" compatibility="5.1.008" expanded="true" height="76" name="Log" width="90" x="313" y="120">
                <parameter key="filename" value="C:\Documents and Settings\Milan\My Documents\pokus.log"/>
                <list key="log">
                  <parameter key="gamma" value="operator.SVM.parameter.gamma"/>
                  <parameter key="C" value="operator.SVM.parameter.C"/>
                  <parameter key="epsilon - p" value="operator.SVM.parameter.p"/>
                  <parameter key="epsilon-e" value="operator.SVM.parameter.epsilon"/>
                  <parameter key="correl" value="operator.Performance (2).value.correlation"/>
                </list>
              </operator>
              <connect from_port="input 1" to_op="Validation" to_port="training"/>
              <connect from_op="Validation" from_port="averagable 1" to_op="Log" to_port="through 1"/>
              <connect from_op="Log" from_port="through 1" to_port="performance"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="set_parameters" compatibility="5.1.008" expanded="true" height="60" name="Set Parameters" width="90" x="581" y="30">
            <list key="name_map">
              <parameter key="SVM" value="SVM2"/>
            </list>
          </operator>
          <operator activated="true" class="read_csv" compatibility="5.1.008" expanded="true" height="60" name="Read CSV (3)" width="90" x="112" y="345">
            <parameter key="csv_file" value="C:\RapidM projekty\pedotransferky\data\test12345.txt"/>
            <parameter key="column_separators" value="&#9;"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <parameter key="encoding" value="windows-1250"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="I.true.real.attribute"/>
              <parameter key="1" value="II.true.real.attribute"/>
              <parameter key="2" value="III.true.real.attribute"/>
              <parameter key="3" value="IV.true.real.attribute"/>
              <parameter key="4" value="ro.true.real.attribute"/>
              <parameter key="5" value="f2\.50.false.real.label"/>
              <parameter key="6" value="f56\.00.false.real.label"/>
              <parameter key="7" value="f209\.00.true.real.label"/>
              <parameter key="8" value="f558\.00.false.real.label"/>
              <parameter key="9" value="f976\.00.false.real.label"/>
              <parameter key="10" value="f3060\.00.false.real.label"/>
              <parameter key="11" value="f15300\.00.false.real.label"/>
            </list>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.1.008" expanded="true" height="76" name="Apply Model (3)" width="90" x="313" y="300">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="support_vector_machine_libsvm" compatibility="5.1.008" expanded="true" height="76" name="SVM2" width="90" x="112" y="210">
            <parameter key="svm_type" value="epsilon-SVR"/>
            <parameter key="degree" value="2"/>
            <parameter key="gamma" value="0.5950000000000001"/>
            <parameter key="coef0" value="32.0"/>
            <parameter key="C" value="304.0"/>
            <list key="class_weights"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.1.008" expanded="true" height="76" name="Apply Model" width="90" x="313" y="165">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="write_excel" compatibility="5.1.008" expanded="true" height="60" name="Write Excel" width="90" x="447" y="165">
            <parameter key="excel_file" value="C:\Documents and Settings\Milan\My Documents\12.xls"/>
          </operator>
          <operator activated="true" class="performance_regression" compatibility="5.1.008" expanded="true" height="76" name="Performance (3)" width="90" x="581" y="165">
            <parameter key="main_criterion" value="correlation"/>
            <parameter key="root_mean_squared_error" value="false"/>
            <parameter key="correlation" value="true"/>
          </operator>
          <connect from_op="Read CSV" from_port="output" to_op="Normalize" to_port="example set input"/>
          <connect from_op="Normalize" from_port="example set output" to_op="Multiply" to_port="input"/>
          <connect from_op="Normalize" from_port="preprocessing model" to_op="Apply Model (3)" to_port="model"/>
          <connect from_op="Multiply" from_port="output 1" to_op="SVM2" to_port="training set"/>
          <connect from_op="Multiply" from_port="output 2" to_op="OP" to_port="input 1"/>
          <connect from_op="OP" from_port="parameter" to_op="Set Parameters" to_port="parameter set"/>
          <connect from_op="Read CSV (3)" from_port="output" to_op="Apply Model (3)" to_port="unlabelled data"/>
          <connect from_op="Apply Model (3)" from_port="labelled data" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="SVM2" from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Write Excel" to_port="input"/>
          <connect from_op="Write Excel" from_port="through" to_op="Performance (3)" to_port="labelled data"/>
          <connect from_op="Performance (3)" from_port="performance" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="54"/>
        </process>
      </operator>
    </process>
  • MariusHelf
    MariusHelf New Altair Community Member
    Hi,

    - in the log operator inside the OP you should log the performance of the the X-Validation to get the averaged performance of all validation runs.
    - you are using different data for optimization and testing -> the result may differ a bit, even if the data are from the same distribution
  • gutompf
    gutompf New Altair Community Member
    Thank you, now is my first RapidMiner prject finished. Thanks for excelent support.
    Milan