<operator name="Root" class="Process" expanded="yes"> <operator name="ExampleSetGenerator" class="ExampleSetGenerator"> <parameter key="target_function" value="sum classification"/> </operator> <operator name="IdTagging" class="IdTagging"> </operator> <operator name="XValidation" class="XValidation" expanded="yes"> <operator name="NaiveBayes" class="NaiveBayes"> </operator> <operator name="OperatorChain" class="OperatorChain" expanded="yes"> <operator name="ModelApplier" class="ModelApplier"> <list key="application_parameters"> </list> </operator> <operator name="ExampleSetWriter" class="ExampleSetWriter"> <parameter key="append" value="true"/> <parameter key="example_set_file" value="single_pred.dat"/> <parameter key="format" value="special_format"/> <parameter key="special_format" value="$i $p $d"/> </operator> <operator name="Performance" class="Performance"> </operator> </operator> </operator></operator>
- The exampleSetWriter displayes the predicted class (e.g. $p) as well as the actual class for each item of the data set (e.g. $l). Due to the fact that the algorithm produces probabilities, what threshold is chosen in order to produce these binary classifications? And how can this threshold be set? (->For further analysis I need to pick a certain threshold in order to receive corresponding predicted classes (and actual classes for each item.)
- What does the expression "confidence" in the rapidminer exactly stand for? (confidences=threshold)?