split results into bins for evaluation

bobdobbs · July 2009

Hi,

Another fun puzzle for the RM team.

I've run a model that outputs confidence values for an SVM class. The values range from 0 to 1 (as expected for this type of model.)

One very common method of evaluation I've seen in papers is to break the results into "bins" or "groups" by confidence range and then report the accuracy of each range.

Something like

Range	# predicted	# correct	% correct
60-65	35	21	60
55-60	130	75	57.6

Any way to this sort of analysis in RM??

Thanks!

steffen · July 2009

Hello bobdobbs

I do not have the time to create a perfect fully tested process, but this non-executable process should give you an idea:


<operator name="Root" class="Process" expanded="yes">
    <operator name="yourdata" class="ExampleSource">
    </operator>
    <operator name="select_confidence_attribute" class="AttributeSubsetPreprocessing" expanded="yes">
        <parameter key="condition_class"	value="attribute_name_filter"/>
        <parameter key="attribute_name_regex"	value="your-confidence-column"/>
        <operator name="define_confidence_discretization_here" class="UserBasedDiscretization">
            <list key="classes">
              <parameter key="last"	value="Infinity"/>
            </list>
        </operator>
    </operator>
    <operator name="iterate_over_discretized_confidence_attribute" class="ValueSubgroupIterator" expanded="yes">
        <list key="attributes">
          <parameter key="discretized_confidence"	value="all"/>
        </list>
        <operator name="your-measure" class="BinominalClassificationPerformance">
            <parameter key="false_positive"	value="true"/>
            <parameter key="true_positive"	value="true"/>
            <parameter key="positive_predictive_value"	value="true"/>
        </operator>
        <operator name="Macro2Log" class="Macro2Log">
            <parameter key="macro_name"	value="loop_value"/>
        </operator>
        <operator name="ProcessLog" class="ProcessLog">
            <list key="log">
              <parameter key="value"	value="operator.Macro2Log.value.macro_value"/>
              <parameter key="performance"	value="operator.your-measure.value.positive_predictive_value"/>
            </list>
        </operator>
    </operator>
</operator>

I hope you can make it from here.

regards,

Steffen

IngoRM · July 2009

Just a small note: beside calculating this yourself as indicated by Steffen, the lift chart operators might be also interesting to you.

Cheers,
Ingo

bobdobbs · July 2009

Thanks Stefan!!!

That's a great way to solve this problem.

I'm getting good predictions, but the confidence "score" seems out of alignment. For example, my 90%-100% confidence level is true about 70%. My 80-90% is true about 60%, etc. So, I guess the model is good, but the scores are just "scores" and not reliable probability.

Thanks!!!!!

-B

steffen · July 2009

Hello bobdobbs

Nice to hear. As a quick idea, you could use the operator Platt Scaling to calibrate the scores i.e. to make them a better approximation to the true probabilties.

kind regards,

Steffen

bobdobbs · July 2009

Stefan,

A great idea as usual!

Question: Where in the process would I put the platt operator. Do I do it after training? or testing? or both?

Thnaks!

steffen · July 2009

Hello again ... weekend break is over ...

I suggest something like the process below. Additional remark: The XVPrediction is used to train a separate basis of confidences to prevent overfitting as suggested in platt's original paper.


<operator name="Root" class="Process" expanded="yes">
    <operator name="generate_set" class="OperatorChain" expanded="no">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function"	value="random"/>
        </operator>
        <operator name="IdTagging" class="IdTagging">
        </operator>
        <operator name="FeatureNameFilter" class="FeatureNameFilter">
            <parameter key="filter_special_features"	value="true"/>
            <parameter key="skip_features_with_name"	value="label"/>
        </operator>
        <operator name="NominalExampleSetGenerator" class="NominalExampleSetGenerator">
            <parameter key="number_of_attributes"	value="1"/>
            <parameter key="number_of_values"	value="2"/>
        </operator>
        <operator name="FeatureNameFilter (2)" class="FeatureNameFilter">
            <parameter key="filter_special_features"	value="true"/>
            <parameter key="skip_features_with_name"	value="att1"/>
        </operator>
        <operator name="IdTagging (2)" class="IdTagging">
        </operator>
        <operator name="ExampleSetJoin" class="ExampleSetJoin">
        </operator>
    </operator>
    <operator name="XVal" class="XValidation" expanded="no">
        <parameter key="sampling_type"	value="shuffled sampling"/>
        <operator name="training" class="OperatorChain" expanded="yes">
            <operator name="Training" class="LibSVMLearner">
                <parameter key="keep_example_set"	value="true"/>
                <parameter key="kernel_type"	value="poly"/>
                <parameter key="C"	value="1000.0"/>
                <list key="class_weights">
                </list>
            </operator>
            <operator name="train_platt_model" class="XVPrediction" expanded="no">
                <parameter key="number_of_validations"	value="3"/>
                <operator name="Training (2)" class="LibSVMLearner">
                    <parameter key="keep_example_set"	value="true"/>
                    <parameter key="kernel_type"	value="poly"/>
                    <parameter key="C"	value="1000.0"/>
                    <list key="class_weights">
                    </list>
                </operator>
                <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                    <operator name="ModelApplier" class="ModelApplier">
                        <list key="application_parameters">
                        </list>
                    </operator>
                    <operator name="dummy" class="ClassificationPerformance">
                        <parameter key="keep_example_set"	value="true"/>
                        <parameter key="accuracy"	value="true"/>
                        <list key="class_weights">
                        </list>
                    </operator>
                </operator>
            </operator>
            <operator name="PlattScaling" class="PlattScaling">
            </operator>
        </operator>
        <operator name="ApplierChain" class="OperatorChain" expanded="yes">
            <operator name="Test" class="ModelApplier">
                <list key="application_parameters">
                </list>
            </operator>
            <operator name="ClassificationPerformance" class="ClassificationPerformance">
                <parameter key="accuracy"	value="true"/>
                <list key="class_weights">
                </list>
            </operator>
        </operator>
    </operator>
</operator>

regards,

Steffen

bobdobbs · July 2009

Stefan,

VERY clever application.

I don't entirely understand why you train an SVM and then train 3 more svm with platt insie the xvpred?? Does the XVPrediciton somehow deliver the "best" platt from the XV tests?

Also, does the platt model from within the XVPrediction pass through to the model that we eventually apply? If so, would it be safe to assume that we could write that model and it would include the Platt as well?

steffen · July 2009

Ok, I see that we need more remarks:

Stefan
I don't entirely understand why you train an SVM and then train 3 more svm with platt insie the xvpred?? Does the XVPrediciton somehow deliver the "best" platt from the XV tests?

Aaarrgh. My fault. Just remove the operator chain (and hence all its childoperators) named "train_platt_model". What I told you above regarding the prevention of overfitting is correct, but RapidMiner's special implementation of Platt Scaling does not allow this strategy (I noted this issues before see here : http://rapid-i.com/rapidforum/index.php/topic,447.0.html , but maybe I was to picky).

Also, does the platt model from within the XVPrediction pass through to the model that we eventually apply? If so, would it be safe to assume that we could write that model and it would include the Platt as well?

Platt Scaling combines the classification model and the calibration model into one. Hence writing and reading should cause no problems.

hope this was helpful,

regards,

Steffen

PS: Sorry, the name is "Steffen", not "Stefan". This is an important difference here in Germany

split results into bins for evaluation

Answers

Categories