Cost-sensitive Learning

jlo
jlo New Altair Community Member
edited November 5 in Community Q&A
Hi there:
I've been using RM for about 2 weeks and I love it. Thanks to the people who created it.

My question is related to Asymetrical cost of misclassification. It may be a silly question (I've already searched the forum but I couldn't find anything).

I'm using the demo program for CostSensitiveLearningandROCplot.xml
Suppose I have a 2 cost matrices as follows:

A= {|0 1| , |1 0|}
B = {|0 1|, |5 0|}


I represent matrix A as:

misclassification_cost_first=1.0
misclassification_cost_second=1.0

I represent matrix B as:

misclassification_cost_first=5.0
misclassification_cost_second=1.0


Moving from matrix A to matrix B I would expect the false positives to drop in number. But the opposite happens (?) . Am I doing something silly?
I understand that the first class is "negative".

(BTW, I've tried MetaCost with the matrix entered exactly as above and I get the expected results)'

Obviously if I switch the 5 and 1 I get the results I expect. But the definition of "misclassification_cost_first" is "cost assigned when an example of the first class is classified as one of the second". I interpret this as the C(2,1) entry in my matrix above.

Here's the code:
 <operator name="Root" class="Process" expanded="yes">
    <description text="#ylt#p#ygt# We use the confidence values delivered by the learner used in this process (soft predictions instead of crisp classifications). All RapidMiner learners deliver these confidence values in addition to the predicted values. They can be read as sort of a guarantee of the learner that the corresponding crisp prediction is actually the true label. Thus it is called confidence. #ylt#/p#ygt# #ylt#p#ygt# In many binary classification scenarios an error for a wrong prediction does not cause the same costs for both classes. A learning scheme should take these asymmetric costs into account. By using the prediction confidences we can turn all classification learners in cost sensitive learners. Therefore, we adjust the confidence threshold for doing some predictions (usually 0.5). #ylt#/p#ygt# #ylt#p#ygt# A ThresholdFinder can be used to determine the best threshold with respect to class weights. The following ThresholdApplier maps the soft predictions (confidences) to crisp classifications with respect to the determined threshold value. The ThresholdFinder can also produce a ROC curve for several thresholds. This is a nice visualization for the performance of a learning scheme. The process stops every time the ROC curve is plotted until you press the Ok button (5 times). The parameter #yquot#show_ROC_plot#yquot# determines if the ROC plot should be displayed at all. #ylt#/p#ygt# #ylt#p#ygt# Further information about the validation operators used in this process can be found in the corresponding sample directory and, of course, in the operator reference of the RapidMiner tutorial. #ylt#/p#ygt#"/>
    <parameter key="logverbosity" value="warning"/>
    <parameter key="random_seed" value="2000"/>
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="target_function" value="random dots classification"/>
        <parameter key="number_examples" value="500"/>
        <parameter key="number_of_attributes" value="2"/>
        <parameter key="attributes_lower_bound" value="0.0"/>
        <parameter key="attributes_upper_bound" value="25.0"/>
    </operator>
    <operator name="XVal" class="XValidation" expanded="yes">
        <parameter key="number_of_validations" value="5"/>
        <operator name="LibSVMLearner" class="LibSVMLearner">
            <parameter key="gamma" value="1.0"/>
            <list key="class_weights">
            </list>
        </operator>
        <operator name="OperatorChain" class="OperatorChain" expanded="yes">
            <operator name="ModelApplier" class="ModelApplier">
                <list key="application_parameters">
                </list>
            </operator>
            <operator name="ThresholdFinder" class="ThresholdFinder">
                <parameter key="misclassification_costs_first" value="5.0"/>
            </operator>
            <operator name="ThresholdApplier" class="ThresholdApplier">
            </operator>
            <operator name="Performance" class="Performance">
            </operator>
        </operator>
    </operator>
</operator>
Tagged:

Answers

  • land
    land New Altair Community Member
    Hi,
    the problem is that it is not determined which class is the first one in the internal binominal mapping. There's an operator called Remap Binominals in RapidMiner 5 that can be used to define the order properly.
    If this does not help, please report back to me, so that I can take a look if there's a bug in the program.

    Greetings,
      Sebastian
  • Stefan_E
    Stefan_E New Altair Community Member
    Sebastian,

    ... are you saying that any explicit reference to classes (to binominal classes?) need be prepended by a Remap operator? Wouldn't it be easier to establish a predefined ordering?

    Stefan
  • land
    land New Altair Community Member
    Hi,
    you might bet: we have spent ages of thinking about this problem and there's simply no good solution. You cannot know how someone calls his positive or negative classes. You not even might assume that positive and negative is an appropriate term. For example what's the positive and the negative in distinguishing red and blue cars?
    You might come up with the heuristic to sort the labels alphabetically, but even this would not guarantee that the indices are constant. For example if new values occur in a test set, that hasn't been seen during training. That's why you have to specify it explicitly...

    Greetings,
      Sebastian
  • Stefan_E
    Stefan_E New Altair Community Member
    Hi Sebastian,

    thanks for your explanation. I see your point. Still, if I take you by the word, that means that
      - I can take the Numerical_to_Binominal converter on a dataset
      - believe that I configure my cost learner correctly because on the training data I see false before true
      - on the next dataset get cheated because for some reason (maybe memory allocation, who knows,...)
          true and false are swapped
      - there is not a single word to that risk in the docu of operators which take class names (eg. cost based
          learners).

    Hence, I would feel much more save if
      - classes generated by RM are in a predefined and fixed order
      - operators acting on class names have a documentation to reference this risk

    Stefan