questions on "Apply Model" operator and predicted label
I use "Apply Model" operator to predict the test data set. The generated results normally includes three types of information ( confidence (positive class), confidence (negative class), predicted label).
Naturally, when confidence (positive class) is larger than confidence (negative class), the prediction label is positive.
But I found a lot of cases ( using libsvm for text classification), even when confidence (positive ) is smaller than confidence (negative class), the prediction label is still positive. I would like to know why?
Naturally, when confidence (positive class) is larger than confidence (negative class), the prediction label is positive.
But I found a lot of cases ( using libsvm for text classification), even when confidence (positive ) is smaller than confidence (negative class), the prediction label is still positive. I would like to know why?
Find more posts tagged with
Sort by:
1 - 7 of
71
Hi, thanks for the reply.
the following is the result of running the "apply model" operator. The model was training using LIBSVM operator. I just posted part of the result which shows the observation I mentioned in the original post, i.e., even the confidence (R) is smaller than confidence (NR), the prediction is still R.
the following is the result of running the "apply model" operator. The model was training using LIBSVM operator. I just posted part of the result which shows the observation I mentioned in the original post, i.e., even the confidence (R) is smaller than confidence (NR), the prediction is still R.
confidence(R) confidence(NR) Prediction(Label) 0.528462399 0.471537601 R 0.524106922 0.475893078 R 0.516740761 0.483259239 R 0.509868083 0.490131917 R 0.505252829 0.494747171 R 0.493653526 0.506346474 R 0.485416242 0.514583758 R 0.475031465 0.524968535 R 0.466340913 0.533659087 R 0.459370807 0.540629193 R 0.458747466 0.541252534 R 0.4577908 0.5422092 R 0.435570459 0.564429541 R 0.432716957 0.567283043 R 0.42963305 0.57036695 R 0.422826691 0.577173309 R 0.412345117 0.587654883 R 0.404687872 0.595312128 R 0.40221958 0.59778042 R 0.39865042 0.60134958 R 0.398228918 0.601771082 R |
The following is the process that I have been using for scoring process.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.011">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.011" expanded="true" name="Process">
<parameter key="parallelize_main_process" value="true"/>
<process expanded="true" height="386" width="711">
<operator activated="true" class="retrieve" compatibility="5.1.011" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
<parameter key="repository_entry" value="SVM_Train_F_words_unigram_tf"/>
</operator>
<operator activated="true" class="text:process_document_from_file" compatibility="5.1.002" expanded="true" height="76" name="Process Documents from Files (2)" width="90" x="179" y="75">
<list key="text_directories">
<parameter key="R" value="E:\R_Validation"/>
<parameter key="NR" value="E:\NR_Validation"/>
</list>
<parameter key="extract_text_only" value="false"/>
<parameter key="vector_creation" value="Term Frequency"/>
<parameter key="prune_below_absolute" value="5"/>
<parameter key="prune_above_absolute" value="5000000"/>
<parameter key="parallelize_vector_creation" value="true"/>
<process expanded="true" height="362" width="674">
<operator activated="true" class="text:tokenize" compatibility="5.1.002" expanded="true" height="60" name="Tokenize (2)" width="90" x="45" y="30"/>
<operator activated="true" class="text:transform_cases" compatibility="5.1.002" expanded="true" height="60" name="Transform Cases (2)" width="90" x="180" y="30"/>
<operator activated="true" class="text:filter_stopwords_english" compatibility="5.1.002" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="315" y="73"/>
<connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
<connect from_op="Tokenize (2)" from_port="document" to_op="Transform Cases (2)" to_port="document"/>
<connect from_op="Transform Cases (2)" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
<connect from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="retrieve" compatibility="5.1.011" expanded="true" height="60" name="Retrieve (2)" width="90" x="179" y="300">
<parameter key="repository_entry" value="SVM_Train_F_model_unigram_tf"/>
</operator>
<operator activated="true" class="apply_model" compatibility="5.1.011" expanded="true" height="76" name="Apply Model" width="90" x="313" y="300">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="5.1.011" expanded="true" height="76" name="Performance" width="90" x="447" y="75">
<list key="class_weights"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.1.011" expanded="true" height="76" name="Select Attributes" width="90" x="447" y="210">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="|confidence(non_res)|confidence(res)|label|prediction(label)"/>
</operator>
<operator activated="true" class="write_csv" compatibility="5.1.011" expanded="true" height="60" name="Write CSV" width="90" x="581" y="165">
<parameter key="csv_file" value="E:\Project\svmscore.csv"/>
<parameter key="column_separator" value=","/>
<parameter key="quote_nominal_values" value="false"/>
<parameter key="format_date_attributes" value="false"/>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Process Documents from Files (2)" to_port="word list"/>
<connect from_op="Process Documents from Files (2)" from_port="example set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Retrieve (2)" from_port="output" to_op="Apply Model" to_port="model"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="result 2"/>
<connect from_op="Performance" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Write CSV" to_port="input"/>
<connect from_op="Write CSV" from_port="through" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Unfortunately, that's not possible. The confidence is an indicator for that, but the exact distance cannot be output.
huaiyanggongzi wrote: By the way, do you know how to output the distance between a given test data point and the hyperplane constructed by training data set? I am also referring to the LiBSVM operator in Rapidminer.
Best regards,
Marius
Best regards,
Marius