🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Cost-sensitive Learning

User: "jlo"
New Altair Community Member
Updated by Jocelyn
Hi there:
I've been using RM for about 2 weeks and I love it. Thanks to the people who created it.

My question is related to Asymetrical cost of misclassification. It may be a silly question (I've already searched the forum but I couldn't find anything).

I'm using the demo program for CostSensitiveLearningandROCplot.xml
Suppose I have a 2 cost matrices as follows:

A= {|0 1| , |1 0|}
B = {|0 1|, |5 0|}


I represent matrix A as:

misclassification_cost_first=1.0
misclassification_cost_second=1.0

I represent matrix B as:

misclassification_cost_first=5.0
misclassification_cost_second=1.0


Moving from matrix A to matrix B I would expect the false positives to drop in number. But the opposite happens (?) . Am I doing something silly?
I understand that the first class is "negative".

(BTW, I've tried MetaCost with the matrix entered exactly as above and I get the expected results)'

Obviously if I switch the 5 and 1 I get the results I expect. But the definition of "misclassification_cost_first" is "cost assigned when an example of the first class is classified as one of the second". I interpret this as the C(2,1) entry in my matrix above.

Here's the code:
 <operator name="Root" class="Process" expanded="yes">
    <description text="#ylt#p#ygt# We use the confidence values delivered by the learner used in this process (soft predictions instead of crisp classifications). All RapidMiner learners deliver these confidence values in addition to the predicted values. They can be read as sort of a guarantee of the learner that the corresponding crisp prediction is actually the true label. Thus it is called confidence. #ylt#/p#ygt# #ylt#p#ygt# In many binary classification scenarios an error for a wrong prediction does not cause the same costs for both classes. A learning scheme should take these asymmetric costs into account. By using the prediction confidences we can turn all classification learners in cost sensitive learners. Therefore, we adjust the confidence threshold for doing some predictions (usually 0.5). #ylt#/p#ygt# #ylt#p#ygt# A ThresholdFinder can be used to determine the best threshold with respect to class weights. The following ThresholdApplier maps the soft predictions (confidences) to crisp classifications with respect to the determined threshold value. The ThresholdFinder can also produce a ROC curve for several thresholds. This is a nice visualization for the performance of a learning scheme. The process stops every time the ROC curve is plotted until you press the Ok button (5 times). The parameter #yquot#show_ROC_plot#yquot# determines if the ROC plot should be displayed at all. #ylt#/p#ygt# #ylt#p#ygt# Further information about the validation operators used in this process can be found in the corresponding sample directory and, of course, in the operator reference of the RapidMiner tutorial. #ylt#/p#ygt#"/>
    <parameter key="logverbosity" value="warning"/>
    <parameter key="random_seed" value="2000"/>
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="target_function" value="random dots classification"/>
        <parameter key="number_examples" value="500"/>
        <parameter key="number_of_attributes" value="2"/>
        <parameter key="attributes_lower_bound" value="0.0"/>
        <parameter key="attributes_upper_bound" value="25.0"/>
    </operator>
    <operator name="XVal" class="XValidation" expanded="yes">
        <parameter key="number_of_validations" value="5"/>
        <operator name="LibSVMLearner" class="LibSVMLearner">
            <parameter key="gamma" value="1.0"/>
            <list key="class_weights">
            </list>
        </operator>
        <operator name="OperatorChain" class="OperatorChain" expanded="yes">
            <operator name="ModelApplier" class="ModelApplier">
                <list key="application_parameters">
                </list>
            </operator>
            <operator name="ThresholdFinder" class="ThresholdFinder">
                <parameter key="misclassification_costs_first" value="5.0"/>
            </operator>
            <operator name="ThresholdApplier" class="ThresholdApplier">
            </operator>
            <operator name="Performance" class="Performance">
            </operator>
        </operator>
    </operator>
</operator>

Find more posts tagged with