🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

questions on "Apply Model" operator and predicted label

User: "winecoding"
New Altair Community Member
Updated by Jocelyn
I use "Apply Model" operator to predict the test data set. The generated results normally includes three types of information ( confidence  (positive class), confidence (negative class), predicted label).

Naturally, when confidence (positive class) is larger than confidence (negative class), the prediction label is positive.

But I found a lot of cases ( using libsvm for text classification), even when confidence (positive ) is smaller than confidence (negative class), the prediction label is still positive. I would like to know why?

Find more posts tagged with

Sort by:
1 - 7 of 71
    User: "MariusHelf"
    New Altair Community Member
    Actually, I have never seen such a case with a plain create model/apply model cycle. Anyway, you can define manual thresholds e.g. with Create Threshold and Apply Threshold, or shift the thresholds in a more sophisticated way with e.g. Choose Recall or other cost-sensitive learning schemes.

    Best regards,
    Marius
    User: "winecoding"
    New Altair Community Member
    OP
    Hi, thanks for the reply.

    the following is the result of running the "apply model" operator. The model was training using LIBSVM operator.  I just posted part of the result which shows the observation I mentioned in the original post, i.e., even the confidence (R) is smaller than confidence (NR), the prediction is still R.


    confidence(R)  confidence(NR) Prediction(Label)
    0.528462399 0.471537601 R
    0.524106922 0.475893078 R
    0.516740761 0.483259239 R
    0.509868083 0.490131917 R
    0.505252829 0.494747171 R
    0.493653526 0.506346474 R
    0.485416242 0.514583758 R
    0.475031465 0.524968535 R
    0.466340913 0.533659087 R
    0.459370807 0.540629193 R
    0.458747466 0.541252534 R
    0.4577908 0.5422092 R
    0.435570459 0.564429541 R
    0.432716957 0.567283043 R
    0.42963305 0.57036695 R
    0.422826691 0.577173309 R
    0.412345117 0.587654883 R
    0.404687872 0.595312128 R
    0.40221958 0.59778042 R
    0.39865042 0.60134958 R
    0.398228918 0.601771082 R
    User: "MariusHelf"
    New Altair Community Member
    Hm, interesting. Can you please post your process xml as described in my signature?

    Best regards,
    Marius
    User: "winecoding"
    New Altair Community Member
    OP
    The following is the process that I have been using for scoring process.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.011">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.011" expanded="true" name="Process">
        <parameter key="parallelize_main_process" value="true"/>
        <process expanded="true" height="386" width="711">
          <operator activated="true" class="retrieve" compatibility="5.1.011" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
            <parameter key="repository_entry" value="SVM_Train_F_words_unigram_tf"/>
          </operator>
          <operator activated="true" class="text:process_document_from_file" compatibility="5.1.002" expanded="true" height="76" name="Process Documents from Files (2)" width="90" x="179" y="75">
            <list key="text_directories">
              <parameter key="R" value="E:\R_Validation"/>
              <parameter key="NR" value="E:\NR_Validation"/>
            </list>
            <parameter key="extract_text_only" value="false"/>
            <parameter key="vector_creation" value="Term Frequency"/>
            <parameter key="prune_below_absolute" value="5"/>
            <parameter key="prune_above_absolute" value="5000000"/>
            <parameter key="parallelize_vector_creation" value="true"/>
            <process expanded="true" height="362" width="674">
              <operator activated="true" class="text:tokenize" compatibility="5.1.002" expanded="true" height="60" name="Tokenize (2)" width="90" x="45" y="30"/>
              <operator activated="true" class="text:transform_cases" compatibility="5.1.002" expanded="true" height="60" name="Transform Cases (2)" width="90" x="180" y="30"/>
              <operator activated="true" class="text:filter_stopwords_english" compatibility="5.1.002" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="315" y="73"/>
              <connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
              <connect from_op="Tokenize (2)" from_port="document" to_op="Transform Cases (2)" to_port="document"/>
              <connect from_op="Transform Cases (2)" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
              <connect from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="retrieve" compatibility="5.1.011" expanded="true" height="60" name="Retrieve (2)" width="90" x="179" y="300">
            <parameter key="repository_entry" value="SVM_Train_F_model_unigram_tf"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.1.011" expanded="true" height="76" name="Apply Model" width="90" x="313" y="300">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_classification" compatibility="5.1.011" expanded="true" height="76" name="Performance" width="90" x="447" y="75">
            <list key="class_weights"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.1.011" expanded="true" height="76" name="Select Attributes" width="90" x="447" y="210">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="|confidence(non_res)|confidence(res)|label|prediction(label)"/>
          </operator>
          <operator activated="true" class="write_csv" compatibility="5.1.011" expanded="true" height="60" name="Write CSV" width="90" x="581" y="165">
            <parameter key="csv_file" value="E:\Project\svmscore.csv"/>
            <parameter key="column_separator" value=","/>
            <parameter key="quote_nominal_values" value="false"/>
            <parameter key="format_date_attributes" value="false"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Process Documents from Files (2)" to_port="word list"/>
          <connect from_op="Process Documents from Files (2)" from_port="example set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Retrieve (2)" from_port="output" to_op="Apply Model" to_port="model"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="result 2"/>
          <connect from_op="Performance" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Write CSV" to_port="input"/>
          <connect from_op="Write CSV" from_port="through" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    User: "MariusHelf"
    New Altair Community Member
    Whoo, you are using RapidMiner 5.1. In a few days RapidMiner 5.3 will be released - I strongly encourage you to update to the latest version (5.2.8) and try again. Please leave a note in this thread if your problem persists or if everything is working fine now.

    Best regards,
    Marius

    User: "winecoding"
    New Altair Community Member
    OP
    Thanks, Marius. I will give it another try after updating Rapidminer

    By the way, do you know how to output the distance between a given test data point and the hyperplane constructed by training data set? I am also referring to the LiBSVM operator in Rapidminer.
    User: "MariusHelf"
    New Altair Community Member
    huaiyanggongzi wrote:
    By the way, do you know how to output the distance between a given test data point and the hyperplane constructed by training data set? I am also referring to the LiBSVM operator in Rapidminer.
    Unfortunately, that's not possible. The confidence is an indicator for that, but the exact distance cannot be output.

    Best regards,
    Marius