SVM paremeter optimization

Rapido
Rapido New Altair Community Member
edited November 5 in Community Q&A
I'm using the parameter optimization for SVM.
If I'm correct the optimization steps are as follows.
1. Search parameters
2. Apply optimal parameters on the model
3. Apply optimal model to new data

Does it make sense to use Xvalidation to apply the optimal parameters???
I mean after the operator "ParameterSetter"? (see attached code)

Why does the "Final Performance" doesn't appear at the end? Only the Binomial Calssification Performance(2) appears.

Another question:
Is it possible to optimize the parameters with AUC. (Area Under Curve)
I read some papers that people have done such an analysis. Is this possible with Rapidminer?


My Idea is to compare different models (SVM, Neuronal Networks, Logistic Regression) on same data.  
<operator name="Root" class="Process" expanded="yes">
    <description text="#ylt#p#ygt# Often the different operators have many parameters and it is not clear which parameter values are best for the learning task at hand. The parameter optimization operator helps to find an optimal parameter set for the used operators. #ylt#/p#ygt#  #ylt#p#ygt# The inner crossvalidation estimates the performance for each parameter set. In this process two parameters of the SVM are tuned. The result can be plotted in 3D (using gnuplot) or in color mode. #ylt#/p#ygt#  #ylt#p#ygt# Try the following: #ylt#ul#ygt# #ylt#li#ygt#Start the process. The result is the best parameter set and the performance which was achieved with this parameter set.#ylt#/li#ygt# #ylt#li#ygt#Edit the parameter list of the ParameterOptimization operator to find another parameter set.#ylt#/li#ygt# #ylt#/ul#ygt# #ylt#/p#ygt# "/>
    <operator name="CSVExampleSource" class="CSVExampleSource">
        <parameter key="filename" value="C:\Users\jo\Documents\rm_workspace\BF_Analyse_daten_221209\Trainset221209_500mPosNeg.aml"/>
    </operator>
    <operator name="Ztransformation" class="Normalization">
        <parameter key="return_preprocessing_model" value="true"/>
    </operator>
    <operator name="IOStorer" class="IOStorer">
        <parameter key="name" value="data"/>
        <parameter key="io_object" value="ExampleSet"/>
        <parameter key="remove_from_process" value="false"/>
    </operator>
    <operator name="ParameterOptimization" class="GridParameterOptimization" expanded="yes">
        <list key="parameters">
          <parameter key="Training.C" value="0.03125,0.044194173824159220275052772631553,0.0625,0.088388347648318440550105545263106,0.125,0.17677669529663688110021109052621,0.25,0.35355339059327376220042218105242,0.5,0.70710678118654752440084436210485,1,1.4142135623730950488016887242097,2,103.18914671611545020044386249798,4,5.6568542494923801952067548968388,8,11.313708498984760390413509793678,16,22.627416997969520780827019587355,32,45.25483399593904156165403917471,64,90.509667991878083123308078349421,128,181.01933598375616624661615669884,256,362.03867196751233249323231339768,512,724.07734393502466498646462679537,1024,1448.1546878700493299729292535907,2048,2896.3093757400986599458585071815,4096,5792.6187514801973198917170143629,8192,11585.237502960394639783434028726,16384,379625062.49700621155642356625329,32768"/>
          <parameter key="Training.gamma" value="0.00000095367431640625,0.0000019073486328125,0.000003814697265625,0.00000762939453125,0.0000152587890625,0.000030517578125,0.00006103515625,0.0001220703125,0.000244140625,0.00048828125,0.0009765625,0.001953125,0.00390625,0.0078125,0.015625,0.03125,0.0625,0.125,0.25,0.5,1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192,16384"/>
        </list>
        <operator name="Validation" class="XValidation" expanded="yes">
            <parameter key="keep_example_set" value="true"/>
            <parameter key="sampling_type" value="shuffled sampling"/>
            <operator name="Training" class="LibSVMLearner">
                <parameter key="degree" value="5"/>
                <parameter key="gamma" value="16384"/>
                <parameter key="C" value="32768"/>
                <parameter key="epsilon" value="0.01"/>
                <list key="class_weights">
                </list>
            </operator>
            <operator name="ApplierChain" class="OperatorChain" expanded="yes">
                <operator name="Test" class="ModelApplier">
                    <parameter key="keep_model" value="true"/>
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="BinominalClassificationPerformance" class="BinominalClassificationPerformance">
                    <parameter key="precision" value="true"/>
                    <parameter key="skip_undefined_labels" value="false"/>
                </operator>
            </operator>
        </operator>
        <operator name="Log" class="ProcessLog">
            <parameter key="filename" value="paraopt.log"/>
            <list key="log">
              <parameter key="C" value="operator.Training.parameter.C"/>
              <parameter key="degree" value="operator.Training.parameter.degree"/>
              <parameter key="absolute" value="operator.BinominalClassificationPerformance.value.sensitivity"/>
            </list>
        </operator>
    </operator>
    <operator name="Apply optimal parameters" class="OperatorChain" expanded="yes">
        <operator name="IORetriever" class="IORetriever">
            <parameter key="name" value="data"/>
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
        <operator name="ParameterSetter" class="ParameterSetter">
            <list key="name_map">
              <parameter key="Training" value="Apply"/>
            </list>
        </operator>
        <operator name="XValidation" class="XValidation" expanded="yes">
            <parameter key="create_complete_model" value="true"/>
            <operator name="Apply" class="LibSVMLearner">
                <parameter key="degree" value="1"/>
                <parameter key="gamma" value="0.000244140625"/>
                <parameter key="C" value="379625062.49700621155642356625329"/>
                <list key="class_weights">
                </list>
            </operator>
            <operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
                <operator name="ApplyModel" class="ModelApplier">
                    <parameter key="keep_model" value="true"/>
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="BinominalClassificationPerformance (2)" class="BinominalClassificationPerformance">
                    <parameter key="precision" value="true"/>
                    <parameter key="skip_undefined_labels" value="false"/>
                </operator>
            </operator>
        </operator>
    </operator>
    <operator name="Test new Data" class="OperatorChain" expanded="yes">
        <operator name="Test Data" class="ExampleSetGenerator">
            <parameter key="target_function" value="random classification"/>
            <parameter key="number_examples" value="10"/>
        </operator>
        <operator name="IOSelector" class="IOSelector">
            <parameter key="io_object" value="Model"/>
        </operator>
        <operator name="ModelApplier" class="ModelApplier">
            <list key="application_parameters">
            </list>
        </operator>
        <operator name="TestModel" class="ModelApplier">
            <list key="application_parameters">
            </list>
            <parameter key="create_view" value="true"/>
        </operator>
        <operator name="Final Performance" class="BinominalClassificationPerformance">
            <parameter key="main_criterion" value="AUC"/>
            <parameter key="AUC" value="true"/>
            <parameter key="precision" value="true"/>
            <parameter key="recall" value="true"/>
            <parameter key="lift" value="true"/>
            <parameter key="skip_undefined_labels" value="false"/>
        </operator>
    </operator>
</operator>
Tagged:

Answers

  • land
    land New Altair Community Member
    Hi,
    this is a little bit different, that you understood it. You have to distinguish between learning a model and a model itself. A learned model cannot be altered with parameters. It's just a product from the learning process, which is controlled by a number of parameters. So what you are doing when optimizing the parameters, you are optimizing the parameters of the learner.
    Once you have found the optimal parameter set for a learner on your data using any of the optimization operators, you might apply these parameters to another learner. This will then learn the optimal model on the complete training data, which should make it perform even better on new data than all models learned during cross-validation, because they will only use a subset of the data for learning the model.
    This model then might be applied to new data in the third step.

    It is possible to optimize the AUC by using it as first performance measure inside the optimization.

    Greetings,
      Sebastian
  • Rapido
    Rapido New Altair Community Member
    Hello Sebastian,

    Thank you for your answer. I just mixed up the terms learner and model. Now I got it.

    1. Find best parameters for a learner with "Grid paremeter optimization"
    2. Apply the best parameters on the learner with "parameterSetter"
    3. Apply the optimized model with best parameters on unseen data.

    So, it doesn't make sense to apply Xvalidation in Step 2 when I apply the best paremeters on the learner.

    Regarding AUC I only get a TRUE or FALSE when I select AUC from the Operator "BinominalClassificationPerformance" inside the "grid parameter optimization". Is there another way to optimize C and Gammma regarding AUC to get a plot like this one?
    image
    <operator name="Root" class="Process" expanded="yes">
       <description text="
    "/>
       <operator name="TrainData" class="ExampleSetGenerator">
           <parameter key="target_function" value="random classification"/>
       </operator>
       <operator name="Ztransformation" class="Normalization">
           <parameter key="return_preprocessing_model" value="true"/>
       </operator>
       <operator name="IOStorer" class="IOStorer">
           <parameter key="name" value="data"/>
           <parameter key="io_object" value="ExampleSet"/>
           <parameter key="remove_from_process" value="false"/>
       </operator>
       <operator name="ParameterOptimization" class="GridParameterOptimization" expanded="yes">
           <list key="parameters">
             <parameter key="Training.C" value="0.03125,0.044194173824159220275052772631553,0.0625,0.088388347648318440550105545263106,0.125,0.17677669529663688110021109052621,0.25,0.35355339059327376220042218105242,0.5,0.70710678118654752440084436210485,1,1.4142135623730950488016887242097,2,103.18914671611545020044386249798,4,5.6568542494923801952067548968388,8,11.313708498984760390413509793678,16,22.627416997969520780827019587355,32,45.25483399593904156165403917471,64,90.509667991878083123308078349421,128,181.01933598375616624661615669884,256,362.03867196751233249323231339768,512,724.07734393502466498646462679537,1024,1448.1546878700493299729292535907,2048,2896.3093757400986599458585071815,4096,5792.6187514801973198917170143629,8192,11585.237502960394639783434028726,16384,379625062.49700621155642356625329,32768"/>
             <parameter key="Training.gamma" value="0.00000095367431640625,0.0000019073486328125,0.000003814697265625,0.00000762939453125,0.0000152587890625,0.000030517578125,0.00006103515625,0.0001220703125,0.000244140625,0.00048828125,0.0009765625,0.001953125,0.00390625,0.0078125,0.015625,0.03125,0.0625,0.125,0.25,0.5,1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192,16384"/>
           </list>
           <operator name="Validation" class="XValidation" expanded="yes">
               <parameter key="keep_example_set" value="true"/>
               <parameter key="sampling_type" value="shuffled sampling"/>
               <operator name="Training" class="LibSVMLearner">
                   <parameter key="degree" value="5"/>
                   <parameter key="gamma" value="16384"/>
                   <parameter key="C" value="32768"/>
                   <parameter key="epsilon" value="0.01"/>
                   <list key="class_weights">
                   </list>
               </operator>
               <operator name="ApplierChain" class="OperatorChain" expanded="yes">
                   <operator name="Test" class="ModelApplier">
                       <parameter key="keep_model" value="true"/>
                       <list key="application_parameters">
                       </list>
                   </operator>
                   <operator name="BinominalClassificationPerformance" class="BinominalClassificationPerformance">
                       <parameter key="precision" value="true"/>
                       <parameter key="skip_undefined_labels" value="false"/>
                   </operator>
               </operator>
           </operator>
           <operator name="Log" class="ProcessLog">
               <parameter key="filename" value="paraopt.log"/>
               <list key="log">
                 <parameter key="C" value="operator.Training.parameter.C"/>
                 <parameter key="Gamma" value="operator.Training.parameter.gamma"/>
                 <parameter key="absolute" value="operator.BinominalClassificationPerformance.value.performance"/>
               </list>
           </operator>
       </operator>
       <operator name="Apply optimal parameters" class="OperatorChain" expanded="yes">
           <operator name="IORetriever" class="IORetriever">
               <parameter key="name" value="data"/>
               <parameter key="io_object" value="ExampleSet"/>
           </operator>
           <operator name="ParameterSetter" class="ParameterSetter">
               <list key="name_map">
                 <parameter key="Training" value="Apply"/>
               </list>
           </operator>
           <operator name="Apply" class="LibSVMLearner">
               <parameter key="degree" value="1"/>
               <parameter key="gamma" value="0.00000095367431640625"/>
               <parameter key="C" value="0.03125"/>
               <list key="class_weights">
               </list>
           </operator>
           <operator name="ApplyModel" class="ModelApplier">
               <parameter key="keep_model" value="true"/>
               <list key="application_parameters">
               </list>
           </operator>
           <operator name="BinominalClassificationPerformance (2)" class="BinominalClassificationPerformance">
               <parameter key="precision" value="true"/>
               <parameter key="skip_undefined_labels" value="false"/>
           </operator>
       </operator>
       <operator name="Test new Data" class="OperatorChain" expanded="yes">
           <operator name="Test Data" class="ExampleSetGenerator">
               <parameter key="target_function" value="random classification"/>
               <parameter key="number_examples" value="10"/>
               <parameter key="local_random_seed" value="3454"/>
           </operator>
           <operator name="IOSelector" class="IOSelector">
               <parameter key="io_object" value="Model"/>
           </operator>
           <operator name="Z-Transformation" class="ModelApplier">
               <list key="application_parameters">
               </list>
           </operator>
           <operator name="TestModel" class="ModelApplier">
               <list key="application_parameters">
               </list>
               <parameter key="create_view" value="true"/>
           </operator>
           <operator name="Final Performance" class="BinominalClassificationPerformance">
               <parameter key="main_criterion" value="AUC"/>
               <parameter key="AUC" value="true"/>
               <parameter key="precision" value="true"/>
               <parameter key="recall" value="true"/>
               <parameter key="lift" value="true"/>
               <parameter key="skip_undefined_labels" value="false"/>
           </operator>
       </operator>
    </operator>
    Thank you for help!

    Greetings

    Rapido
  • fischer
    fischer New Altair Community Member
    Hi,

    well that looks exactly like what you get from the grid parameter optimization. Just use a Process Log operator inside the optimization operator to log C, Gamma, and AUC, and then plot it.

    Cheers,
    Simon