Hello,
I have a problem with nested cross validation. I have already read and implemented the code from the already discussed subject in the following link:
http://rapid-i.com/rapidforum/index.php?topic=615.0But this does not really help me.
I have a binary classification problem with input data of the type "real".
I firstly split the data into training and test set. The testset remains completely untouched till the end. For training I will use a cross validation. In order to optimize the parameters of the SVM (C, Gamma), I the optimizeParameters operator. In order to make the parameter setting available to train the optimized new SVM I use the ParameterSettings operator which should deliver the optimal C and Gamma to the new SVM. This SVM is trained again over the whole training set with the optimal C,Gamma. After that I want performance of the new SVM over the training and the testset so I build one model to test it onto the testset and one for the training set.
The problem hereby is that ParameterSettings operator does not seem to deliver the optimal parameter setting to the new SVM. I can see that because the C and Gamma of the new SVM does not have changed after the process. Another indicator is that the kernel model of the test model varies in terms of the amount of support vectors which are used. Moreover if I put a random name in the name map of the Parameter settings operator for the field of “set operator name” (instead of the SVM_train), There is no error message and the result will be the same.
Can you please help me?! Find attached my XML code.
Many thanks in advance
Daniel
Code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.000" expanded="true" name="Root">
<description><p> Often the different operators have many parameters and it is not clear which parameter values are best for the learning task at hand. The parameter optimization operator helps to find an optimal parameter set for the used operators. </p> <p> The inner crossvalidation estimates the performance for each parameter set. In this process two parameters of the SVM are tuned. The result can be plotted in 3D (using gnuplot) or in color mode. </p> <p> Try the following: <ul> <li>Start the process. The result is the best parameter set and the performance which was achieved with this parameter set.</li> <li>Edit the parameter list of the ParameterOptimization operator to find another parameter set.</li> </ul> </p> </description>
<parameter key="parallelize_main_process" value="true"/>
<process expanded="true" height="449" width="815">
<operator activated="true" class="read_excel" compatibility="5.3.000" expanded="true" height="60" name="Read Excel" width="90" x="45" y="30">
<parameter key="excel_file" value="D:\Promotion\Matlab\Ich\Workspaces\Tag\Feature_Matrix_nonlin_test.xls"/>
<parameter key="imported_cell_range" value="A1:IV262"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="label.true.binominal.label"/>
<parameter key="1" value="a1.true.real.attribute"/>
<parameter key="253" value="a253.true.real.attribute"/>
<parameter key="254" value="a254.true.real.attribute"/>
<parameter key="255" value="a255.true.real.attribute"/>
</list>
</operator>
<operator activated="true" class="split_data" compatibility="5.3.000" expanded="true" height="94" name="Split Data" width="90" x="45" y="120">
<enumeration key="partitions">
<parameter key="ratio" value="0.9"/>
<parameter key="ratio" value="0.1"/>
</enumeration>
<parameter key="sampling_type" value="linear sampling"/>
</operator>
<operator activated="true" class="multiply" compatibility="5.3.000" expanded="true" height="94" name="Multiply" width="90" x="179" y="30"/>
<operator activated="true" class="optimize_parameters_grid" compatibility="5.3.000" expanded="true" height="130" name="loopThroughLocalParams" width="90" x="313" y="30">
<list key="parameters">
<parameter key="SVM_train.C" value="[1;100;10;quadratic]"/>
<parameter key="SVM_train.gamma" value="[0.0;100;10;quadratic]"/>
</list>
<parameter key="parallelize_optimization_process" value="true"/>
<process expanded="true" height="316" width="699">
<operator activated="true" class="parallel:x_validation_parallel" compatibility="5.1.002" expanded="true" height="112" name="Validation" width="90" x="313" y="30">
<parameter key="use_local_random_seed" value="true"/>
<process expanded="true" height="334" width="360">
<operator activated="true" class="support_vector_machine_libsvm" compatibility="5.3.000" expanded="true" height="76" name="SVM_train" width="90" x="112" y="30">
<parameter key="gamma" value="100.0"/>
<parameter key="C" value="100.0"/>
<list key="class_weights"/>
</operator>
<connect from_port="training" to_op="SVM_train" to_port="training set"/>
<connect from_op="SVM_train" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true" height="334" width="360">
<operator activated="true" class="apply_model" compatibility="5.3.000" expanded="true" height="76" name="Test_train" width="90" x="45" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="5.3.000" expanded="true" height="76" name="ClassificationPerformance_train_train" width="90" x="179" y="30">
<parameter key="weighted_mean_recall" value="true"/>
<parameter key="weighted_mean_precision" value="true"/>
<parameter key="absolute_error" value="true"/>
<parameter key="root_mean_squared_error" value="true"/>
<list key="class_weights"/>
</operator>
<connect from_port="model" to_op="Test_train" to_port="model"/>
<connect from_port="test set" to_op="Test_train" to_port="unlabelled data"/>
<connect from_op="Test_train" from_port="labelled data" to_op="ClassificationPerformance_train_train" to_port="labelled data"/>
<connect from_op="ClassificationPerformance_train_train" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<connect from_port="input 1" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="model" to_port="result 2"/>
<connect from_op="Validation" from_port="training" to_port="result 1"/>
<connect from_op="Validation" from_port="averagable 1" to_port="performance"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
<operator activated="true" class="set_parameters" compatibility="5.3.000" expanded="true" height="94" name="ParameterSetter" width="90" x="514" y="30">
<list key="name_map">
<parameter key="SVM_train" value="SVM_test"/>
<parameter key="SVM_train" value="applyModel"/>
<parameter key="SVM_train" value="applyModel (2)"/>
</list>
</operator>
<operator activated="true" class="support_vector_machine_libsvm" compatibility="5.3.000" expanded="true" height="76" name="SVM_test" width="90" x="179" y="165">
<list key="class_weights"/>
<parameter key="calculate_confidences" value="true"/>
</operator>
<operator activated="true" class="multiply" compatibility="5.3.000" expanded="true" height="94" name="Multiply (2)" width="90" x="313" y="165"/>
<operator activated="true" class="apply_model" compatibility="5.3.000" expanded="true" height="76" name="applyModel" width="90" x="447" y="165">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="5.3.000" expanded="true" height="76" name="Performance_testset" width="90" x="581" y="165">
<parameter key="weighted_mean_recall" value="true"/>
<parameter key="weighted_mean_precision" value="true"/>
<parameter key="absolute_error" value="true"/>
<parameter key="root_mean_squared_error" value="true"/>
<list key="class_weights"/>
</operator>
<operator activated="true" class="apply_model" compatibility="5.3.000" expanded="true" height="76" name="applyModel (2)" width="90" x="447" y="255">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="5.3.000" expanded="true" height="76" name="Performance_train_new" width="90" x="581" y="255">
<parameter key="weighted_mean_recall" value="true"/>
<parameter key="weighted_mean_precision" value="true"/>
<parameter key="absolute_error" value="true"/>
<parameter key="root_mean_squared_error" value="true"/>
<list key="class_weights"/>
</operator>
<connect from_op="Read Excel" from_port="output" to_op="Split Data" to_port="example set"/>
<connect from_op="Split Data" from_port="partition 1" to_op="Multiply" to_port="input"/>
<connect from_op="Split Data" from_port="partition 2" to_op="applyModel" to_port="unlabelled data"/>
<connect from_op="Multiply" from_port="output 1" to_op="loopThroughLocalParams" to_port="input 1"/>
<connect from_op="Multiply" from_port="output 2" to_op="applyModel (2)" to_port="unlabelled data"/>
<connect from_op="loopThroughLocalParams" from_port="performance" to_op="ParameterSetter" to_port="through 1"/>
<connect from_op="loopThroughLocalParams" from_port="parameter" to_op="ParameterSetter" to_port="parameter set"/>
<connect from_op="loopThroughLocalParams" from_port="result 1" to_op="SVM_test" to_port="training set"/>
<connect from_op="loopThroughLocalParams" from_port="result 2" to_port="result 6"/>
<connect from_op="ParameterSetter" from_port="parameter set" to_port="result 1"/>
<connect from_op="ParameterSetter" from_port="through 1" to_port="result 3"/>
<connect from_op="SVM_test" from_port="model" to_op="Multiply (2)" to_port="input"/>
<connect from_op="Multiply (2)" from_port="output 1" to_op="applyModel" to_port="model"/>
<connect from_op="Multiply (2)" from_port="output 2" to_op="applyModel (2)" to_port="model"/>
<connect from_op="applyModel" from_port="labelled data" to_op="Performance_testset" to_port="labelled data"/>
<connect from_op="applyModel" from_port="model" to_port="result 5"/>
<connect from_op="Performance_testset" from_port="performance" to_port="result 2"/>
<connect from_op="applyModel (2)" from_port="labelled data" to_op="Performance_train_new" to_port="labelled data"/>
<connect from_op="Performance_train_new" from_port="performance" to_port="result 4"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
<portSpacing port="sink_result 5" spacing="0"/>
<portSpacing port="sink_result 6" spacing="0"/>
<portSpacing port="sink_result 7" spacing="0"/>
</process>
</operator>
</process>