Optimize Parameters fails on F-measure

HeikoPaulheim
New Altair Community Member
Hi,
I try to optimize parameters towards F-measure. There may be cases where the F-measure is undefined (if there are no true positives), but I know that some configurations exist where F-measure is at least defined (i.e., at least one true positive).
The optimize (grid) operator, however, always returns a configuration where F-measure is undefined.
Is there any way to circumvent that behavior?
Best,
Heiko
I try to optimize parameters towards F-measure. There may be cases where the F-measure is undefined (if there are no true positives), but I know that some configurations exist where F-measure is at least defined (i.e., at least one true positive).
The optimize (grid) operator, however, always returns a configuration where F-measure is undefined.
Is there any way to circumvent that behavior?
Best,
Heiko
Tagged:
0
Answers
-
Are you really optimizing your model with respect to F-measure ? Please put your process here to check:
XML0 -
There it is. Yields an F-measure of 0. If I change the main measure to AUC, it yields an F-measure of ~37%, so it's technically possible to get a higher value here.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="5.3.015" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
<parameter key="csv_file" value="C:\Users\Heiko\Documents\Forschung\DBpediaDebugging\redirects\training_features.csv"/>
<parameter key="column_separators" value="	"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="Original.true.polynominal.id"/>
<parameter key="1" value="Replaced.true.polynominal.batch"/>
<parameter key="2" value="Correct.true.binominal.label"/>
<parameter key="3" value="Plausible.true.integer.attribute"/>
<parameter key="4" value="Distribution.true.real.attribute"/>
<parameter key="5" value="Levenstein.true.integer.attribute"/>
<parameter key="6" value="Levenstein (relative).true.real.attribute"/>
<parameter key="7" value="Jaccard.true.real.attribute"/>
<parameter key="8" value="Jaro.true.real.attribute"/>
<parameter key="9" value="JaroWinkler.true.real.attribute"/>
<parameter key="10" value="Prefix.true.real.attribute"/>
<parameter key="11" value="Prefix2.true.real.attribute"/>
<parameter key="12" value="Substring1.true.real.attribute"/>
<parameter key="13" value="Substring2.true.real.attribute"/>
<parameter key="14" value="Redirects.true.integer.attribute"/>
<parameter key="15" value="Disambiguations.true.integer.attribute"/>
</list>
</operator>
<operator activated="true" class="optimize_parameters_grid" compatibility="5.3.015" expanded="true" height="94" name="Optimize Parameters (2)" width="90" x="246" y="30">
<list key="parameters">
<parameter key="SVM (4).gamma" value="[0.0000001;1000000;13;logarithmic]"/>
<parameter key="SVM (4).C" value="[0.0000001;1000000;13;logarithmic]"/>
</list>
<process expanded="true">
<operator activated="true" class="x_validation" compatibility="5.1.002" expanded="true" height="112" name="Validation (3)" width="90" x="246" y="30">
<description>A cross-validation evaluating a decision tree model.</description>
<process expanded="true">
<operator activated="true" class="support_vector_machine_libsvm" compatibility="5.3.015" expanded="true" height="76" name="SVM (4)" width="90" x="90" y="30">
<parameter key="gamma" value="1.0000000000000003E-4"/>
<parameter key="C" value="1000000.0"/>
<list key="class_weights">
<parameter key="0" value="20.0"/>
<parameter key="1" value="1.0"/>
</list>
</operator>
<connect from_port="training" to_op="SVM (4)" to_port="training set"/>
<connect from_op="SVM (4)" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="5.3.015" expanded="true" height="76" name="Apply Model (5)" width="90" x="45" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_binominal_classification" compatibility="5.3.015" expanded="true" height="76" name="Performance (5)" width="90" x="179" y="30">
<parameter key="f_measure" value="true"/>
</operator>
<connect from_port="model" to_op="Apply Model (5)" to_port="model"/>
<connect from_port="test set" to_op="Apply Model (5)" to_port="unlabelled data"/>
<connect from_op="Apply Model (5)" from_port="labelled data" to_op="Performance (5)" to_port="labelled data"/>
<connect from_op="Performance (5)" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<connect from_port="input 1" to_op="Validation (3)" to_port="training"/>
<connect from_op="Validation (3)" from_port="averagable 1" to_port="performance"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Optimize Parameters (2)" to_port="input 1"/>
<connect from_op="Optimize Parameters (2)" from_port="parameter" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0 -
If I may make a guess for the cause here: I think RM internally computes the F-measure without checking for tp=0. If you divide by 0 in Java, the result becomes larger than any other double:
Thus, if not handled separately, a configuration which produces zero true positives (i.e., both recall and precision are 0) will always be favored over any other configuration, since the F-measure is a term with 0 as its denominator. Usually, F1 is defined as 0 if tp=0, although the term itself is undefined for that case.
double d1 = 1.0;
double d2 = 1.0/0.0;
System.out.println(d1>d2);
System.out.println(d2>d1);0