questions on "Apply Model" operator and predicted label

Unknown
edited November 5 in Community Q&A
I use "Apply Model" operator to predict the test data set. The generated results normally includes three types of information ( confidence  (positive class), confidence (negative class), predicted label).

Naturally, when confidence (positive class) is larger than confidence (negative class), the prediction label is positive.

But I found a lot of cases ( using libsvm for text classification), even when confidence (positive ) is smaller than confidence (negative class), the prediction label is still positive. I would like to know why?
Tagged:

Answers

  • MariusHelf
    MariusHelf New Altair Community Member
    Actually, I have never seen such a case with a plain create model/apply model cycle. Anyway, you can define manual thresholds e.g. with Create Threshold and Apply Threshold, or shift the thresholds in a more sophisticated way with e.g. Choose Recall or other cost-sensitive learning schemes.

    Best regards,
    Marius
  • Hi, thanks for the reply.

    the following is the result of running the "apply model" operator. The model was training using LIBSVM operator.  I just posted part of the result which shows the observation I mentioned in the original post, i.e., even the confidence (R) is smaller than confidence (NR), the prediction is still R.


    confidence(R)  confidence(NR) Prediction(Label)
    0.528462399 0.471537601 R
    0.524106922 0.475893078 R
    0.516740761 0.483259239 R
    0.509868083 0.490131917 R
    0.505252829 0.494747171 R
    0.493653526 0.506346474 R
    0.485416242 0.514583758 R
    0.475031465 0.524968535 R
    0.466340913 0.533659087 R
    0.459370807 0.540629193 R
    0.458747466 0.541252534 R
    0.4577908 0.5422092 R
    0.435570459 0.564429541 R
    0.432716957 0.567283043 R
    0.42963305 0.57036695 R
    0.422826691 0.577173309 R
    0.412345117 0.587654883 R
    0.404687872 0.595312128 R
    0.40221958 0.59778042 R
    0.39865042 0.60134958 R
    0.398228918 0.601771082 R
  • MariusHelf
    MariusHelf New Altair Community Member
    Hm, interesting. Can you please post your process xml as described in my signature?

    Best regards,
    Marius
  • The following is the process that I have been using for scoring process.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.011">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.011" expanded="true" name="Process">
        <parameter key="parallelize_main_process" value="true"/>
        <process expanded="true" height="386" width="711">
          <operator activated="true" class="retrieve" compatibility="5.1.011" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
            <parameter key="repository_entry" value="SVM_Train_F_words_unigram_tf"/>
          </operator>
          <operator activated="true" class="text:process_document_from_file" compatibility="5.1.002" expanded="true" height="76" name="Process Documents from Files (2)" width="90" x="179" y="75">
            <list key="text_directories">
              <parameter key="R" value="E:\R_Validation"/>
              <parameter key="NR" value="E:\NR_Validation"/>
            </list>
            <parameter key="extract_text_only" value="false"/>
            <parameter key="vector_creation" value="Term Frequency"/>
            <parameter key="prune_below_absolute" value="5"/>
            <parameter key="prune_above_absolute" value="5000000"/>
            <parameter key="parallelize_vector_creation" value="true"/>
            <process expanded="true" height="362" width="674">
              <operator activated="true" class="text:tokenize" compatibility="5.1.002" expanded="true" height="60" name="Tokenize (2)" width="90" x="45" y="30"/>
              <operator activated="true" class="text:transform_cases" compatibility="5.1.002" expanded="true" height="60" name="Transform Cases (2)" width="90" x="180" y="30"/>
              <operator activated="true" class="text:filter_stopwords_english" compatibility="5.1.002" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="315" y="73"/>
              <connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
              <connect from_op="Tokenize (2)" from_port="document" to_op="Transform Cases (2)" to_port="document"/>
              <connect from_op="Transform Cases (2)" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
              <connect from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="retrieve" compatibility="5.1.011" expanded="true" height="60" name="Retrieve (2)" width="90" x="179" y="300">
            <parameter key="repository_entry" value="SVM_Train_F_model_unigram_tf"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.1.011" expanded="true" height="76" name="Apply Model" width="90" x="313" y="300">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_classification" compatibility="5.1.011" expanded="true" height="76" name="Performance" width="90" x="447" y="75">
            <list key="class_weights"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.1.011" expanded="true" height="76" name="Select Attributes" width="90" x="447" y="210">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="|confidence(non_res)|confidence(res)|label|prediction(label)"/>
          </operator>
          <operator activated="true" class="write_csv" compatibility="5.1.011" expanded="true" height="60" name="Write CSV" width="90" x="581" y="165">
            <parameter key="csv_file" value="E:\Project\svmscore.csv"/>
            <parameter key="column_separator" value=","/>
            <parameter key="quote_nominal_values" value="false"/>
            <parameter key="format_date_attributes" value="false"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Process Documents from Files (2)" to_port="word list"/>
          <connect from_op="Process Documents from Files (2)" from_port="example set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Retrieve (2)" from_port="output" to_op="Apply Model" to_port="model"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="result 2"/>
          <connect from_op="Performance" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Write CSV" to_port="input"/>
          <connect from_op="Write CSV" from_port="through" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • MariusHelf
    MariusHelf New Altair Community Member
    Whoo, you are using RapidMiner 5.1. In a few days RapidMiner 5.3 will be released - I strongly encourage you to update to the latest version (5.2.8) and try again. Please leave a note in this thread if your problem persists or if everything is working fine now.

    Best regards,
    Marius

  • Thanks, Marius. I will give it another try after updating Rapidminer

    By the way, do you know how to output the distance between a given test data point and the hyperplane constructed by training data set? I am also referring to the LiBSVM operator in Rapidminer.
  • MariusHelf
    MariusHelf New Altair Community Member
    huaiyanggongzi wrote:
    By the way, do you know how to output the distance between a given test data point and the hyperplane constructed by training data set? I am also referring to the LiBSVM operator in Rapidminer.
    Unfortunately, that's not possible. The confidence is an indicator for that, but the exact distance cannot be output.

    Best regards,
    Marius