"10x10 Cross-Validation"

moni_faria
moni_faria New Altair Community Member
edited November 5 in Community Q&A
Hello,

I am trying to perform a 10x10 cross-validation (IteratingPerformanceAverage) with the objective of applying t test and ANOVA. Although once the project that are in the attach produce results now doesn´t work. The message of error is:  Process failed, IndexOutOfBounds Exception  caught, Index:1, Size: 1

Can you help me?

The project is:

  <?xml version="1.0" encoding="windows-1252" ?>
- <process version="4.4">
- <operator name="Root" class="Process" expanded="yes">
  <description text="#ylt#p#ygt#Many RapidMiner operators can be used to estimate the performance of a learner, a preprocessing step, or a feature space on one or several data sets. The result of these validation operators is a performance vector collecting the values of a set of performance criteria. For each criterion, the mean value and standard deviation are given. #ylt#/p#ygt# #ylt#p#ygt#The question is how these performance vectors can be compared? Statistical significance tests like ANOVA or pairwise t-tests can be used to calculate the probability that the actual mean values are different. #ylt#/p#ygt# #ylt#p#ygt# We assume that you have achieved several performance vectors and want to compare them. In this experiment we use the same data set for both cross validations (hence the IOMultiplier) and estimate the performance of a linear learning scheme and a RBF based SVM. #ylt#/p#ygt# #ylt#p#ygt# Run the experiment and compare the results: the probabilities for a significant difference are equal since only two performance vectors were created. In this case the SVM is probably better suited for the data set at hand since the actual mean values are probably different.#ylt#/p#ygt##ylt#p#ygt#Please note that performance vectors like all other objects which can be passed between RapidMiner operators can be written into and loaded from a file.#ylt#/p#ygt#" />
  <parameter key="logverbosity" value="init" />
  <parameter key="random_seed" value="2001" />
  <parameter key="encoding" value="SYSTEM" />
- <operator name="ExcelExampleSource" class="ExcelExampleSource">
  <parameter key="excel_file" value="D:\Monica_Faria\Doutoramento\2_Semestre\Testes\Tentativa_Final_Monica_Bola_e_Jogadores_C_Massa.xls" />
  <parameter key="sheet_number" value="1" />
  <parameter key="row_offset" value="0" />
  <parameter key="column_offset" value="0" />
  <parameter key="first_row_as_names" value="true" />
  <parameter key="create_label" value="true" />
  <parameter key="label_column" value="27" />
  <parameter key="create_id" value="false" />
  <parameter key="id_column" value="1" />
  <parameter key="decimal_point_character" value="." />
  <parameter key="datamanagement" value="double_array" />
  </operator>
- <operator name="IteratingPerformanceAverage" class="IteratingPerformanceAverage" expanded="yes">
  <parameter key="iterations" value="10" />
  <parameter key="average_performances_only" value="true" />
- <operator name="NB-XValidation" class="XValidation" expanded="no">
  <parameter key="keep_example_set" value="true" />
  <parameter key="create_complete_model" value="false" />
  <parameter key="average_performances_only" value="true" />
  <parameter key="leave_one_out" value="false" />
  <parameter key="number_of_validations" value="10" />
  <parameter key="sampling_type" value="shuffled sampling" />
  <parameter key="local_random_seed" value="-1" />
- <operator name="NaiveBayes" class="NaiveBayes">
  <parameter key="keep_example_set" value="false" />
  <parameter key="laplace_correction" value="true" />
  </operator>
- <operator name="OperatorChain-NB" class="OperatorChain" expanded="yes">
- <operator name="ModelApplier-NB" class="ModelApplier">
  <parameter key="keep_model" value="false" />
  <list key="application_parameters" />
  <parameter key="create_view" value="false" />
  </operator>
- <operator name="Performance-NB" class="ClassificationPerformance">
  <parameter key="keep_example_set" value="false" />
  <parameter key="main_criterion" value="first" />
  <parameter key="accuracy" value="true" />
  <parameter key="classification_error" value="false" />
  <parameter key="kappa" value="false" />
  <parameter key="weighted_mean_recall" value="false" />
  <parameter key="weighted_mean_precision" value="false" />
  <parameter key="spearman_rho" value="false" />
  <parameter key="kendall_tau" value="false" />
  <parameter key="absolute_error" value="false" />
  <parameter key="relative_error" value="false" />
  <parameter key="relative_error_lenient" value="false" />
  <parameter key="relative_error_strict" value="false" />
  <parameter key="normalized_absolute_error" value="false" />
  <parameter key="root_mean_squared_error" value="false" />
  <parameter key="root_relative_squared_error" value="false" />
  <parameter key="squared_error" value="false" />
  <parameter key="correlation" value="false" />
  <parameter key="squared_correlation" value="false" />
  <parameter key="cross-entropy" value="false" />
  <parameter key="margin" value="false" />
  <parameter key="soft_margin_loss" value="false" />
  <parameter key="logistic_loss" value="false" />
  <parameter key="skip_undefined_labels" value="true" />
  <parameter key="use_example_weights" value="true" />
  <list key="class_weights" />
  </operator>
  </operator>
  </operator>
- <operator name="kNN-XValidation" class="XValidation" expanded="no">
  <parameter key="keep_example_set" value="true" />
  <parameter key="create_complete_model" value="false" />
  <parameter key="average_performances_only" value="true" />
  <parameter key="leave_one_out" value="false" />
  <parameter key="number_of_validations" value="10" />
  <parameter key="sampling_type" value="shuffled sampling" />
  <parameter key="local_random_seed" value="-1" />
- <operator name="kNN" class="NearestNeighbors">
  <parameter key="keep_example_set" value="false" />
  <parameter key="k" value="3" />
  <parameter key="weighted_vote" value="false" />
  <parameter key="measure_types" value="MixedMeasures" />
  <parameter key="mixed_measure" value="MixedEuclideanDistance" />
  <parameter key="nominal_measure" value="NominalDistance" />
  <parameter key="numerical_measure" value="EuclideanDistance" />
  <parameter key="divergence" value="GeneralizedIDivergence" />
  <parameter key="kernel_type" value="radial" />
  <parameter key="kernel_gamma" value="1.0" />
  <parameter key="kernel_sigma1" value="1.0" />
  <parameter key="kernel_sigma2" value="0.0" />
  <parameter key="kernel_sigma3" value="2.0" />
  <parameter key="kernel_degree" value="3.0" />
  <parameter key="kernel_shift" value="1.0" />
  <parameter key="kernel_a" value="1.0" />
  <parameter key="kernel_b" value="0.0" />
  </operator>
- <operator name="OperatorChain-kNN" class="OperatorChain" expanded="yes">
- <operator name="ModelApplier-kNN" class="ModelApplier">
  <parameter key="keep_model" value="false" />
  <list key="application_parameters" />
  <parameter key="create_view" value="false" />
  </operator>
- <operator name="Performance-kNN" class="ClassificationPerformance">
  <parameter key="keep_example_set" value="false" />
  <parameter key="main_criterion" value="first" />
  <parameter key="accuracy" value="true" />
  <parameter key="classification_error" value="false" />
  <parameter key="kappa" value="false" />
  <parameter key="weighted_mean_recall" value="false" />
  <parameter key="weighted_mean_precision" value="false" />
  <parameter key="spearman_rho" value="false" />
  <parameter key="kendall_tau" value="false" />
  <parameter key="absolute_error" value="false" />
  <parameter key="relative_error" value="false" />
  <parameter key="relative_error_lenient" value="false" />
  <parameter key="relative_error_strict" value="false" />
  <parameter key="normalized_absolute_error" value="false" />
  <parameter key="root_mean_squared_error" value="false" />
  <parameter key="root_relative_squared_error" value="false" />
  <parameter key="squared_error" value="false" />
  <parameter key="correlation" value="false" />
  <parameter key="squared_correlation" value="false" />
  <parameter key="cross-entropy" value="false" />
  <parameter key="margin" value="false" />
  <parameter key="soft_margin_loss" value="false" />
  <parameter key="logistic_loss" value="false" />
  <parameter key="skip_undefined_labels" value="true" />
  <parameter key="use_example_weights" value="true" />
  <list key="class_weights" />
  </operator>
  </operator>
  </operator>
- <operator name="SVM-XValidation" class="XValidation" expanded="no">
  <parameter key="keep_example_set" value="true" />
  <parameter key="create_complete_model" value="false" />
  <parameter key="average_performances_only" value="true" />
  <parameter key="leave_one_out" value="false" />
  <parameter key="number_of_validations" value="10" />
  <parameter key="sampling_type" value="shuffled sampling" />
  <parameter key="local_random_seed" value="-1" />
- <operator name="LibSVMLearner" class="LibSVMLearner">
  <parameter key="keep_example_set" value="false" />
  <parameter key="svm_type" value="C-SVC" />
  <parameter key="kernel_type" value="rbf" />
  <parameter key="degree" value="3" />
  <parameter key="gamma" value="0.0" />
  <parameter key="coef0" value="0.0" />
  <parameter key="C" value="0.0" />
  <parameter key="nu" value="0.5" />
  <parameter key="cache_size" value="80" />
  <parameter key="epsilon" value="0.0010" />
  <parameter key="p" value="0.1" />
  <list key="class_weights" />
  <parameter key="shrinking" value="true" />
  <parameter key="calculate_confidences" value="false" />
  <parameter key="confidence_for_multiclass" value="true" />
  </operator>
- <operator name="OperatorChain-SVM" class="OperatorChain" expanded="yes">
- <operator name="ModelApplier-SVM" class="ModelApplier">
  <parameter key="keep_model" value="false" />
  <list key="application_parameters" />
  <parameter key="create_view" value="false" />
  </operator>
- <operator name="Performance-SVM" class="ClassificationPerformance">
  <parameter key="keep_example_set" value="false" />
  <parameter key="main_criterion" value="first" />
  <parameter key="accuracy" value="true" />
  <parameter key="classification_error" value="false" />
  <parameter key="kappa" value="false" />
  <parameter key="weighted_mean_recall" value="false" />
  <parameter key="weighted_mean_precision" value="false" />
  <parameter key="spearman_rho" value="false" />
  <parameter key="kendall_tau" value="false" />
  <parameter key="absolute_error" value="false" />
  <parameter key="relative_error" value="false" />
  <parameter key="relative_error_lenient" value="false" />
  <parameter key="relative_error_strict" value="false" />
  <parameter key="normalized_absolute_error" value="false" />
  <parameter key="root_mean_squared_error" value="false" />
  <parameter key="root_relative_squared_error" value="false" />
  <parameter key="squared_error" value="false" />
  <parameter key="correlation" value="false" />
  <parameter key="squared_correlation" value="false" />
  <parameter key="cross-entropy" value="false" />
  <parameter key="margin" value="false" />
  <parameter key="soft_margin_loss" value="false" />
  <parameter key="logistic_loss" value="false" />
  <parameter key="skip_undefined_labels" value="true" />
  <parameter key="use_example_weights" value="true" />
  <list key="class_weights" />
  </operator>
  </operator>
  </operator>
  </operator>
- <operator name="T-Test" class="T-Test">
  <parameter key="alpha" value="0.05" />
  </operator>
- <operator name="Anova" class="Anova">
  <parameter key="alpha" value="0.05" />
  </operator>
  </operator>
  </process>

Answers

  • fischer
    fischer New Altair Community Member
    Hi,

    for future posts, please copy XML code from the RapidMiner XML tab and not from the Internet Explorer, which adds these small "-" handles for expanding and hiding XML element nodes. Also, please post processes that do not depend on input files, but replace them by an ExampleSetGenerator or the like.

    In fact, you spotted a bug in the current release which prevents the IteratingPerformanceAverage to work with more than one average verctor. This will be fixed in the next release.
    You can work around this by setting up an individual IteratingPerformanceAverage around the individual cross validations. Use an IOStorer after each to avoid that PerformanceVectors are fed into subsequent IteratingPerformanceAverages again, and finally retrieve all using an IORetriever.

    Best,
    Simon