A program to recognize and reward our most engaged community members
Hi all,
There's a long time since I don't post anytime here. Last time I made this last comment, saying the polynomial by binomial classification will solve this problem, but in fact it didn't.
Even right now with RM 7, it;s not possible to obtain a simple AUC or f-measure with a multiclass problem, for example, the Iris dataset, the most fundamental of machine learning problems.
You know there's mathematical formulation to calculate other evaluation measurements for multiclass problems (called by RM as binomial and polynomial problems). In RM there is a clear distinction what operators can manipulate binomial and a polynomial classification problems, but this distinction should not exist anymore.
Hi,
i am curious, how do you calculate AUC for a polynominal problem? And why don't you take logloss?
Best,
Martin
In my experience, using AUC for multi-class classification problems is typically done by looking at separation of one class vs all others. It would certainly be nice if RapidMiner did this automatically for polynominal labels, but it should be able to be done manually by remapping labels after the fact, I think.
Hi Martin,
Thanks for your reply.
I'm a RapidMiner fan, and I here to help some how. Sincerelly I thought this problem was solved in the newest versions, but I think it's not fair to say an operator does not work, without a deep research, so, my apologies. Nevertheless, so far I could not find a way to calculate AUC and f-measure for a multiclass problem, using for example the IRIS dataset, with 3 classes. What I could notice in other tools is: for f-measure, they make an average or a weighted average, and other choices of statistical mean for f-measure. For AUC basicall they are following the formulation existing in the literature:
Fawcett, T., 2006, An introduction to ROC analysis, Pattern Recognition Letters 27, 861-874.Hand, D.J., Till, R.J., 2001. A simple generalization of the area under the ROC curve to multiple class classification problems. Machine Learning 45 (2), 171-186.
If you could point me out what operators should I use to do this in RapidMiner it will be great, as I'm evaluating this for a company.
To overcome this problem in my personal research with multiclass problem I made an operator to integrate the RM predictions with the Weka evaluator, then I've got all the predictive results from Rapidminer, measured by the Weka classifier evaluator. However, just as a constructive thought, even if the solution for this is already there in RapidMiner, I think it's time to rethink how to do this rather than using the concept of binomial and polynomial, as others DM tools are progressing well without this.
Hi again,
i need to disagree. I do think that the difference between polynominal and binominal is pretty clear. There are simply models like an SVM which can not cope with binominal data. That is a fact. LibSVM is simply using a internal wrapper to do one-vs-all and thus make a binominal algorithm runable on a polynominal data set. That is fully supported with Polynominal by Binominal Classification operator
The point you raise, is that you would also like to be able to use a binominal performance measure for a polynominal problem. While i see that this is a possible approach, i would argue that logloss is a better measure. The missing operator in RM would be Polynominal by Binominal Performance, which would be a similar nested operator like the learner. Should not be hard to write.
~Martin
I could make a process to calculate AUC and F-Measure for 3 classes with Iris dataset (Thanks Brian T. for the tip with Map), using the Polynomial by Binomial Classification and SVM, find below. You can also get the ROC plots by class. The Polynomial by Binomial Classification is working fine.
The con with this process is: to calculate the performaces with Performance Binomial I need to remap each label agains all, to calculate the performance averages by class I transformed the performance to data, but it delivers what is wanted.
In order to contribute to improve this scenario somehow, I'm preparing my Weka Classification Measure Operator to output performance (right now it returns data), then I'll offer that code to be integrated to Weka extension.
Anyway I'm packaging my Weka Classification Measure Operator as is in the next days, and post it here.
Regards,
<?xml version="1.0" encoding="UTF-8"?><process version="7.3.001"><context><input/><output/><macros/>context><operator activated="true" class="process" compatibility="7.3.001" expanded="true"name="Process"><parameter key="logverbosity" value="init"/><parameter key="random_seed" value="2001"/><parameter key="send_mail" value="never"/><parameter key="notification_email" value=""/><parameter key="process_duration_for_mail" value="30"/><parameter key="encoding" value="SYSTEM"/><process expanded="true"><operator activated="true" class="retrieve" compatibility="7.3.001" expanded="true"height="68" name="Retrieve Iris" width="90" x="45" y="34"><parameter key="repository_entry" value="//Samples/data/Iris"/></operator><operator activated="true" class="split_data" compatibility="7.3.001" expanded="true"height="103" name="Split Data" width="90" x="179" y="85"><enumeration key="partitions"><parameter key="ratio" value="0.7"/><parameter key="ratio" value="0.3"/></enumeration><parameter key="sampling_type" value="shuffled sampling"/><parameter key="use_local_random_seed" value="false"/><parameter key="local_random_seed" value="1992"/></operator><operator activated="true" class="polynomial_by_binomial_classification"compatibility="7.3.001" expanded="true" height="82" name="Polynomial by BinomialClassification" width="90" x="447" y="85"><parameter key="classification_strategies" value="1 against all"/><parameter key="random_code_multiplicator" value="2.0"/><parameter key="use_local_random_seed" value="false"/><parameter key="local_random_seed" value="1992"/><process expanded="true"><operator activated="true" class="support_vector_machine_libsvm"compatibility="7.3.001" expanded="true" height="82" name="SVM" width="90" x="112"y="34"><parameter key="svm_type" value="C-SVC"/><parameter key="kernel_type" value="rbf"/><parameter key="degree" value="3"/><parameter key="gamma" value="0.125"/><parameter key="coef0" value="0.0"/><parameter key="C" value="2.0"/><parameter key="nu" value="0.5"/><parameter key="cache_size" value="80"/><parameter key="epsilon" value="0.001"/><parameter key="p" value="0.1"/><list key="class_weights"/><parameter key="shrinking" value="true"/><parameter key="calculate_confidences" value="false"/><parameter key="confidence_for_multiclass" value="true"/></operator><connect from_port="training set" to_op="SVM" to_port="training set"/><connect from_op="SVM" from_port="model" to_port="model"/><portSpacing port="source_training set" spacing="0"/><portSpacing port="sink_model" spacing="0"/></process></operator><operator activated="true" class="apply_model" compatibility="7.3.001" expanded="true"height="82" name="Apply Model (2)" width="90" x="313" y="289"><list key="application_parameters"/><parameter key="create_view" value="false"/></operator><operator activated="true" class="multiply" compatibility="7.3.001" expanded="true"height="124" name="Multiply" width="90" x="112" y="442"/><operator activated="true" class="map" compatibility="7.3.001" expanded="true"height="82" name="vir-all" width="90" x="112" y="748"><parameter key="attribute_filter_type" value="subset"/><parameter key="attribute" value="label"/><parameter key="attributes" value="label|prediction(label)"/><parameter key="use_except_expression" value="false"/><parameter key="value_type" value="attribute_value"/><parameter key="use_value_type_exception" value="false"/><parameter key="except_value_type" value="time"/><parameter key="block_type" value="attribute_block"/><parameter key="use_block_type_exception" value="false"/><parameter key="except_block_type" value="value_matrix_row_start"/><parameter key="invert_selection" value="false"/><parameter key="include_special_attributes" value="true"/><list key="value_mappings"><parameter key="Iris-setosa" value="all"/><parameter key="Iris-versicolor" value="all"/></list><parameter key="consider_regular_expressions" value="false"/><parameter key="add_default_mapping" value="false"/></operator><operator activated="true" class="performance_binominal_classification"compatibility="7.3.001" expanded="true" height="82" name="vir-all-perf" width="90"x="246" y="1003"><parameter key="main_criterion" value="first"/><parameter key="accuracy" value="false"/><parameter key="classification_error" value="false"/><parameter key="kappa" value="false"/><parameter key="AUC (optimistic)" value="false"/><parameter key="AUC" value="true"/><parameter key="AUC (pessimistic)" value="false"/><parameter key="precision" value="false"/><parameter key="recall" value="false"/><parameter key="lift" value="false"/><parameter key="fallout" value="false"/><parameter key="f_measure" value="true"/><parameter key="false_positive" value="false"/><parameter key="false_negative" value="false"/><parameter key="true_positive" value="false"/><parameter key="true_negative" value="false"/><parameter key="sensitivity" value="false"/><parameter key="specificity" value="false"/><parameter key="youden" value="false"/><parameter key="positive_predictive_value" value="false"/><parameter key="negative_predictive_value" value="false"/><parameter key="psep" value="false"/><parameter key="skip_undefined_labels" value="true"/><parameter key="use_example_weights" value="false"/></operator><operator activated="true" class="map" compatibility="7.3.001" expanded="true"height="82" name="set-all" width="90" x="380" y="697"><parameter key="attribute_filter_type" value="subset"/><parameter key="attribute" value="label"/><parameter key="attributes" value="label|prediction(label)"/><parameter key="use_except_expression" value="false"/><parameter key="value_type" value="attribute_value"/><parameter key="use_value_type_exception" value="false"/><parameter key="except_value_type" value="time"/><parameter key="block_type" value="attribute_block"/><parameter key="use_block_type_exception" value="false"/><parameter key="except_block_type" value="value_matrix_row_start"/><parameter key="invert_selection" value="false"/><parameter key="include_special_attributes" value="true"/><list key="value_mappings"><parameter key="Iris-virginica" value="all"/><parameter key="Iris-versicolor" value="all"/></list><parameter key="consider_regular_expressions" value="false"/><parameter key="add_default_mapping" value="false"/></operator><operator activated="true" class="performance_binominal_classification"compatibility="7.3.001" expanded="true" height="82" name="set-all-perf" width="90"x="581" y="901"><parameter key="main_criterion" value="first"/><parameter key="accuracy" value="false"/><parameter key="classification_error" value="false"/><parameter key="kappa" value="false"/><parameter key="AUC (optimistic)" value="false"/><parameter key="AUC" value="true"/><parameter key="AUC (pessimistic)" value="false"/><parameter key="precision" value="false"/><parameter key="recall" value="false"/><parameter key="lift" value="false"/><parameter key="fallout" value="false"/><parameter key="f_measure" value="true"/><parameter key="false_positive" value="false"/><parameter key="false_negative" value="false"/><parameter key="true_positive" value="false"/><parameter key="true_negative" value="false"/><parameter key="sensitivity" value="false"/><parameter key="specificity" value="false"/><parameter key="youden" value="false"/><parameter key="positive_predictive_value" value="false"/><parameter key="negative_predictive_value" value="false"/><parameter key="psep" value="false"/><parameter key="skip_undefined_labels" value="true"/><parameter key="use_example_weights" value="false"/></operator><operator activated="true" class="performance_to_data" compatibility="7.3.001"expanded="true" height="82" name="Performance to Data (2)" width="90" x="782" y="901"/><operator activated="true" class="map" compatibility="7.3.001" expanded="true"height="82" name="ver-all" width="90" x="581" y="646"><parameter key="attribute_filter_type" value="subset"/><parameter key="attribute" value="label"/><parameter key="attributes" value="label|prediction(label)"/><parameter key="use_except_expression" value="false"/><parameter key="value_type" value="attribute_value"/><parameter key="use_value_type_exception" value="false"/><parameter key="except_value_type" value="time"/><parameter key="block_type" value="attribute_block"/><parameter key="use_block_type_exception" value="false"/><parameter key="except_block_type" value="value_matrix_row_start"/><parameter key="invert_selection" value="false"/><parameter key="include_special_attributes" value="true"/><list key="value_mappings"><parameter key="Iris-virginica" value="all"/><parameter key="Iris-setosa" value="all"/></list><parameter key="consider_regular_expressions" value="false"/><parameter key="add_default_mapping" value="false"/></operator><operator activated="true" class="performance_binominal_classification"compatibility="7.3.001" expanded="true" height="82" name="ver-all-perf" width="90"x="581" y="799"><parameter key="main_criterion" value="first"/><parameter key="accuracy" value="false"/><parameter key="classification_error" value="false"/><parameter key="kappa" value="false"/><parameter key="AUC (optimistic)" value="false"/><parameter key="AUC" value="true"/><parameter key="AUC (pessimistic)" value="false"/><parameter key="precision" value="false"/><parameter key="recall" value="false"/><parameter key="lift" value="false"/><parameter key="fallout" value="false"/><parameter key="f_measure" value="true"/><parameter key="false_positive" value="false"/><parameter key="false_negative" value="false"/><parameter key="true_positive" value="false"/><parameter key="true_negative" value="false"/><parameter key="sensitivity" value="false"/><parameter key="specificity" value="false"/><parameter key="youden" value="false"/><parameter key="positive_predictive_value" value="false"/><parameter key="negative_predictive_value" value="false"/><parameter key="psep" value="false"/><parameter key="skip_undefined_labels" value="true"/><parameter key="use_example_weights" value="false"/></operator><operator activated="true" class="performance_to_data" compatibility="7.3.001"expanded="true" height="82" name="Performance to Data" width="90" x="782" y="799"/><operator activated="true" class="union" compatibility="7.3.001" expanded="true"height="82" name="Union" width="90" x="983" y="901"/><operator activated="true" class="performance_to_data" compatibility="7.3.001"expanded="true" height="82" name="Performance to Data (3)" width="90" x="782" y="1003"/><operator activated="true" class="union" compatibility="7.3.001" expanded="true"height="82" name="Union (2)" width="90" x="1117" y="1003"/><operator activated="true" class="select_attributes" compatibility="7.3.001"expanded="true" height="82" name="Select Attributes" width="90" x="1050" y="748"><parameter key="attribute_filter_type" value="subset"/><parameter key="attribute" value=""/><parameter key="attributes" value="Value|Criterion"/><parameter key="use_except_expression" value="false"/><parameter key="value_type" value="attribute_value"/><parameter key="use_value_type_exception" value="false"/><parameter key="except_value_type" value="time"/><parameter key="block_type" value="attribute_block"/><parameter key="use_block_type_exception" value="false"/><parameter key="except_block_type" value="value_matrix_row_start"/><parameter key="invert_selection" value="false"/><parameter key="include_special_attributes" value="false"/></operator><operator activated="true" class="generate_attributes" compatibility="7.3.001"expanded="true" height="82" name="AUC FIX Values" width="90" x="983" y="544"><list key="function_descriptions"><parameter key="Value" value="if(Criterion=="AUC" &&Value==0,1,Value)"/></list><parameter key="keep_all" value="true"/></operator><operator activated="true" class="aggregate" compatibility="7.3.001" expanded="true"height="82" name="Aggregate" width="90" x="983" y="340"><parameter key="use_default_aggregation" value="false"/><parameter key="attribute_filter_type" value="all"/><parameter key="attribute" value=""/><parameter key="attributes" value=""/><parameter key="use_except_expression" value="false"/><parameter key="value_type" value="attribute_value"/><parameter key="use_value_type_exception" value="false"/><parameter key="except_value_type" value="time"/><parameter key="block_type" value="attribute_block"/><parameter key="use_block_type_exception" value="false"/><parameter key="except_block_type" value="value_matrix_row_start"/><parameter key="invert_selection" value="false"/><parameter key="include_special_attributes" value="false"/><parameter key="default_aggregation_function" value="average"/><list key="aggregation_attributes"><parameter key="Value" value="average"/><parameter key="Value" value="standard_deviation"/></list><parameter key="group_by_attributes" value="Criterion"/><parameter key="count_all_combinations" value="false"/><parameter key="only_distinct" value="false"/><parameter key="ignore_missings" value="true"/></operator><connect from_op="Retrieve Iris" from_port="output" to_op="Split Data"to_port="example set"/><connect from_op="Split Data" from_port="partition 1" to_op="Polynomial by BinomialClassification" to_port="training set"/><connect from_op="Split Data" from_port="partition 2" to_op="Apply Model (2)"to_port="unlabelled data"/><connect from_op="Polynomial by Binomial Classification" from_port="model"to_op="Apply Model (2)" to_port="model"/><connect from_op="Apply Model (2)" from_port="labelled data" to_op="Multiply"to_port="input"/><connect from_op="Multiply" from_port="output 1" to_op="ver-all" to_port="example setinput"/><connect from_op="Multiply" from_port="output 2" to_op="set-all" to_port="example setinput"/><connect from_op="Multiply" from_port="output 3" to_op="vir-all" to_port="example setinput"/><connect from_op="vir-all" from_port="example set output" to_op="vir-all-perf"to_port="labelled data"/><connect from_op="vir-all-perf" from_port="performance" to_op="Performance to Data(3)" to_port="performance vector"/><connect from_op="set-all" from_port="example set output" to_op="set-all-perf"to_port="labelled data"/><connect from_op="set-all-perf" from_port="performance" to_op="Performance to Data(2)" to_port="performance vector"/><connect from_op="Performance to Data (2)" from_port="example set" to_op="Union"to_port="example set 2"/><connect from_op="ver-all" from_port="example set output" to_op="ver-all-perf"to_port="labelled data"/><connect from_op="ver-all-perf" from_port="performance" to_op="Performance to Data"to_port="performance vector"/><connect from_op="Performance to Data" from_port="example set" to_op="Union"to_port="example set 1"/><connect from_op="Union" from_port="union" to_op="Union (2)" to_port="example set 1"/><connect from_op="Performance to Data (3)" from_port="example set" to_op="Union (2)"to_port="example set 2"/><connect from_op="Union (2)" from_port="union" to_op="Select Attributes"to_port="example set input"/><connect from_op="Select Attributes" from_port="example set output" to_op="AUC FIXValues" to_port="example set input"/><connect from_op="AUC FIX Values" from_port="example set output" to_op="Aggregate"to_port="example set input"/><connect from_op="Aggregate" from_port="example set output" to_port="result 1"/><connect from_op="Aggregate" from_port="original" to_port="result 2"/><portSpacing port="source_input 1" spacing="0"/><portSpacing port="sink_result 1" spacing="0"/><portSpacing port="sink_result 2" spacing="0"/><portSpacing port="sink_result 3" spacing="0"/></process></operator></process>