"Not getting confidence values from LibSVM [SOLVED]"
javart
New Altair Community Member
I'm running a multi-class classification experiment using LibSVM.
When I check the classification output from the trained model, I see predicted labels, but all the confidence values are equal to zero.
I have checked the parameter "calculate confidences" in the LibSVM operator. Am I missing something?
Below there's my XML for the process as well as a few lines from my input data.
When I check the classification output from the trained model, I see predicted labels, but all the confidence values are equal to zero.
I have checked the parameter "calculate confidences" in the LibSVM operator. Am I missing something?
Below there's my XML for the process as well as a few lines from my input data.
Input data looks like this
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.012">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.012" expanded="true" name="Process">
<parameter key="logverbosity" value="status"/>
<parameter key="logfile" value="log"/>
<parameter key="resultfile" value="result"/>
<process expanded="true">
<operator activated="true" class="read_sparse" compatibility="5.3.012" expanded="true" height="60" name="Read Sparse" width="90" x="112" y="120">
<parameter key="format" value="yx"/>
<parameter key="data_file" value="/home/javier/workspace/Taxonomy Integration/data/machineLearning/100012.dat.3"/>
<parameter key="dimension" value="70000"/>
<parameter key="datamanagement" value="int_sparse_array"/>
<list key="prefix_map"/>
</operator>
<operator activated="true" class="split_validation" compatibility="5.3.012" expanded="true" height="112" name="Validation" width="90" x="313" y="120">
<parameter key="split_ratio" value="0.8"/>
<parameter key="training_set_size" value="1000"/>
<parameter key="test_set_size" value="1000"/>
<parameter key="sampling_type" value="stratified sampling"/>
<parameter key="use_local_random_seed" value="true"/>
<process expanded="true">
<operator activated="true" class="support_vector_machine_libsvm" compatibility="5.3.012" expanded="true" height="76" name="SVM" width="90" x="112" y="30">
<parameter key="kernel_type" value="linear"/>
<list key="class_weights"/>
<parameter key="calculate_confidences" value="true"/>
</operator>
<connect from_port="training" to_op="SVM" to_port="training set"/>
<connect from_op="SVM" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="5.3.012" expanded="true" height="76" name="Apply Model" width="90" x="112" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="write_model" compatibility="5.3.012" expanded="true" height="60" name="Write Model" width="90" x="246" y="165">
<parameter key="model_file" value="model.mod"/>
<parameter key="output_type" value="Binary"/>
</operator>
<operator activated="true" breakpoints="after" class="performance_classification" compatibility="5.3.012" expanded="true" height="76" name="Performance" width="90" x="246" y="30">
<list key="class_weights"/>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Apply Model" from_port="model" to_op="Write Model" to_port="input"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="write_performance" compatibility="5.3.012" expanded="true" height="60" name="Write Performance" width="90" x="514" y="120">
<parameter key="performance_file" value="performance.per"/>
</operator>
<connect from_op="Read Sparse" from_port="output" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="averagable 1" to_op="Write Performance" to_port="input"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
</process>
553124 2266:-1 8045:-1 9392:-1 10397:-1 13481:1 14509:-1 17368:1 18888:1 26913:1 27083:1 27107:1 27122:-1 27859:-1 37441:1 37993:1 40703:1 48407:-1 61367:-1
553124 8549:-1 13902:-1 21611:-1 23697:-1 36878:1 40703:1 42809:-1 55147:1 55972:-1 56351:1 62848:-1
553124 2092:1 2536:-1 10411:3 12125:-1 27555:1 32520:-1 36916:1 40080:-1 40703:1 41936:1 42809:-1 43505:-1 44430:-1 46301:-1 49588:-1 54999:1 56521:1 61488:-1 61793:-1
553124 7788:1 14296:-1 22385:1 26071:-1 32520:-1 32816:-1 35700:1 39122:1 53325:-1 54817:-1
553124 1658:-1 1867:-1 2092:1 2213:-1 4929:1 5356:1 8549:-1 9381:1 11392:-1 12125:-1 13234:-1 17874:-1 20346:-1 29660:-1 31941:-1 35387:1 36916:1 40703:2 41936:1 42809:-2 43985:-1 45613:-1 49588:-1 50956:1 52474:-2 54438:-1 56521:1 63618:-1
202540 286:1 3953:1 5356:1 9072:1 13795:-1 23821:-1 41755:1 43214:-1 45612:-1 46172:1 55598:-1
202540 3407:1 37238:-1 39212:1 39218:-1 44578:1 51070:1
202540 7504:-1 11594:1 36560:-1 43513:1
202540 5356:1 6204:-1 10012:1 10168:-1 11090:1 14114:-1 14437:-1 18720:1 22369:-1 33038:1 36283:-1 38182:1 40847:1 48736:-2 49346:-1 51470:-1 62562:-1
202540 8661:-1 9381:1 19454:1 27163:1 55619:1 62149:-1 65440:1
202540 9381:1 19974:1 24768:1 25063:1 31787:-1 40703:1 43214:-1 44319:1 63377:1
Tagged:
0
Answers
-
Hi,
internally everything is correct, the real confidences are used to predict the label. But you are in 'Read Sparse' you are using an int_sparse_array to store your data.
When storing the confidences to this int_sparse_array the confidence values are rounded (and therefore are 0.0 all the time). If you change the datamanagement parameter to
double_sparse_array the correct values should be shown.
Best,
Nils0 -
Thank you so much!
I didn't realize that data structure would also hold the ML output.
It works correctly after changing 'Read Sparse' to double_sparse_array.
Javier.0