Feature selection - maximize recall performance
Andy3
New Altair Community Member
Hello,
I'm a bit out in the blue on this one. How is it possible to maximize the recall performance metric in the feature selection phase with the Automate Feature Selection operator? Normally when I use this operator I minimized the classification error metric and then it don't generate any errors. Though when I try with the recall performance metric it throws an error. So I guess I'm not using the correct operator for maximize the recall in the feature optimize phase, but if someone could point in the right direction that would be nice.
Thanks.
Andy
Tagged:
0
Best Answers
-
Hi,I think you need to create your own metric with performance to Data, Generate Attributes and Extract performance where you generate 1-Recall. By definition the operator wants to minimize it's performance metric.
Right @IngoRM ?Cheers,
Martin5 -
Hi Andy, Martin,
Yes, that's right. The operator minimizes the performance criterion which works universally well for error rates across classification and regression problems but leads to this behavior. As Martin has suggested, you can use a workaround to make this work with other measurements, too. I have added a small example process below.
Hope this helps,
Ingo<div><?xml version="1.0" encoding="UTF-8"?><process version="9.7.000-SNAPSHOT"></div><div> <context></div><div> <input/></div><div> <output/></div><div> <macros/></div><div> </context></div><div> <operator activated="true" class="process" compatibility="9.7.000-SNAPSHOT" expanded="true" name="Process"></div><div> <parameter key="logverbosity" value="init"/></div><div> <parameter key="random_seed" value="2001"/></div><div> <parameter key="send_mail" value="never"/></div><div> <parameter key="notification_email" value=""/></div><div> <parameter key="process_duration_for_mail" value="30"/></div><div> <parameter key="encoding" value="UTF-8"/></div><div> <process expanded="true"></div><div> <operator activated="true" class="retrieve" compatibility="9.7.000-SNAPSHOT" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="45" y="34"></div><div> <parameter key="repository_entry" value="//Samples/data/Titanic Training"/></div><div> </operator></div><div> <operator activated="true" class="model_simulator:automatic_feature_engineering" compatibility="9.7.000-SNAPSHOT" expanded="true" height="103" name="Automatic Feature Engineering" width="90" x="179" y="34"></div><div> <parameter key="mode" value="feature selection"/></div><div> <parameter key="balance for accuracy" value="1.0"/></div><div> <parameter key="show progress dialog" value="false"/></div><div> <parameter key="use_local_random_seed" value="false"/></div><div> <parameter key="local_random_seed" value="1992"/></div><div> <parameter key="use optimization heuristics" value="true"/></div><div> <parameter key="maximum generations" value="30"/></div><div> <parameter key="population size" value="10"/></div><div> <parameter key="use multi-starts" value="true"/></div><div> <parameter key="number of multi-starts" value="5"/></div><div> <parameter key="generations until multi-start" value="10"/></div><div> <parameter key="use time limit" value="false"/></div><div> <parameter key="time limit in seconds" value="60"/></div><div> <parameter key="use subset for generation" value="false"/></div><div> <parameter key="maximum function complexity" value="10"/></div><div> <parameter key="use_plus" value="false"/></div><div> <parameter key="use_diff" value="false"/></div><div> <parameter key="use_mult" value="true"/></div><div> <parameter key="use_div" value="true"/></div><div> <parameter key="reciprocal_value" value="true"/></div><div> <parameter key="use_square_roots" value="false"/></div><div> <parameter key="use_exp" value="false"/></div><div> <parameter key="use_log" value="false"/></div><div> <parameter key="use_absolute_values" value="false"/></div><div> <parameter key="use_sgn" value="false"/></div><div> <parameter key="use_min" value="false"/></div><div> <parameter key="use_max" value="false"/></div><div> <process expanded="true"></div><div> <operator activated="true" class="split_validation" compatibility="9.7.000-SNAPSHOT" expanded="true" height="124" name="Validation" width="90" x="45" y="34"></div><div> <parameter key="create_complete_model" value="false"/></div><div> <parameter key="split" value="relative"/></div><div> <parameter key="split_ratio" value="0.7"/></div><div> <parameter key="training_set_size" value="100"/></div><div> <parameter key="test_set_size" value="-1"/></div><div> <parameter key="sampling_type" value="automatic"/></div><div> <parameter key="use_local_random_seed" value="true"/></div><div> <parameter key="local_random_seed" value="1992"/></div><div> <process expanded="true"></div><div> <operator activated="true" class="naive_bayes" compatibility="9.7.000-SNAPSHOT" expanded="true" height="82" name="Naive Bayes" width="90" x="45" y="34"></div><div> <parameter key="laplace_correction" value="true"/></div><div> </operator></div><div> <connect from_port="training" to_op="Naive Bayes" to_port="training set"/></div><div> <connect from_op="Naive Bayes" from_port="model" to_port="model"/></div><div> <portSpacing port="source_training" spacing="0"/></div><div> <portSpacing port="sink_model" spacing="0"/></div><div> <portSpacing port="sink_through 1" spacing="0"/></div><div> </process></div><div> <process expanded="true"></div><div> <operator activated="true" class="apply_model" compatibility="9.7.000-SNAPSHOT" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34"></div><div> <list key="application_parameters"/></div><div> <parameter key="create_view" value="false"/></div><div> </operator></div><div> <operator activated="true" class="performance_binominal_classification" compatibility="9.7.000-SNAPSHOT" expanded="true" height="82" name="Performance" width="90" x="179" y="34"></div><div> <parameter key="manually_set_positive_class" value="false"/></div><div> <parameter key="main_criterion" value="first"/></div><div> <parameter key="accuracy" value="false"/></div><div> <parameter key="classification_error" value="false"/></div><div> <parameter key="kappa" value="false"/></div><div> <parameter key="AUC (optimistic)" value="false"/></div><div> <parameter key="AUC" value="false"/></div><div> <parameter key="AUC (pessimistic)" value="false"/></div><div> <parameter key="precision" value="false"/></div><div> <parameter key="recall" value="true"/></div><div> <parameter key="lift" value="false"/></div><div> <parameter key="fallout" value="false"/></div><div> <parameter key="f_measure" value="false"/></div><div> <parameter key="false_positive" value="false"/></div><div> <parameter key="false_negative" value="false"/></div><div> <parameter key="true_positive" value="false"/></div><div> <parameter key="true_negative" value="false"/></div><div> <parameter key="sensitivity" value="false"/></div><div> <parameter key="specificity" value="false"/></div><div> <parameter key="youden" value="false"/></div><div> <parameter key="positive_predictive_value" value="false"/></div><div> <parameter key="negative_predictive_value" value="false"/></div><div> <parameter key="psep" value="false"/></div><div> <parameter key="skip_undefined_labels" value="true"/></div><div> <parameter key="use_example_weights" value="true"/></div><div> </operator></div><div> <operator activated="true" class="performance_to_data" compatibility="9.7.000-SNAPSHOT" expanded="true" height="82" name="Performance to Data" width="90" x="45" y="238"/></div><div> <operator activated="true" class="generate_attributes" compatibility="9.7.000-SNAPSHOT" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="238"></div><div> <list key="function_descriptions"></div><div> <parameter key="Value" value="-1*[Value]"/></div><div> </list></div><div> <parameter key="keep_all" value="true"/></div><div> </operator></div><div> <operator activated="true" class="extract_performance" compatibility="9.7.000-SNAPSHOT" expanded="true" height="82" name="Performance (2)" width="90" x="313" y="238"></div><div> <parameter key="performance_type" value="data_value"/></div><div> <parameter key="statistics" value="average"/></div><div> <parameter key="attribute_name" value="Value"/></div><div> <parameter key="example_index" value="1"/></div><div> <parameter key="optimization_direction" value="minimize"/></div><div> </operator></div><div> <connect from_port="model" to_op="Apply Model" to_port="model"/></div><div> <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/></div><div> <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/></div><div> <connect from_op="Performance" from_port="performance" to_op="Performance to Data" to_port="performance vector"/></div><div> <connect from_op="Performance to Data" from_port="example set" to_op="Generate Attributes" to_port="example set input"/></div><div> <connect from_op="Generate Attributes" from_port="example set output" to_op="Performance (2)" to_port="example set"/></div><div> <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/></div><div> <portSpacing port="source_model" spacing="0"/></div><div> <portSpacing port="source_test set" spacing="0"/></div><div> <portSpacing port="source_through 1" spacing="0"/></div><div> <portSpacing port="sink_averagable 1" spacing="0"/></div><div> <portSpacing port="sink_averagable 2" spacing="0"/></div><div> </process></div><div> </operator></div><div> <connect from_port="example set source" to_op="Validation" to_port="training"/></div><div> <connect from_op="Validation" from_port="averagable 1" to_port="performance sink"/></div><div> <portSpacing port="source_example set source" spacing="0"/></div><div> <portSpacing port="sink_performance sink" spacing="0"/></div><div> </process></div><div> </operator></div><div> <connect from_op="Retrieve Titanic Training" from_port="output" to_op="Automatic Feature Engineering" to_port="example set in"/></div><div> <connect from_op="Automatic Feature Engineering" from_port="feature set" to_port="result 1"/></div><div> <connect from_op="Automatic Feature Engineering" from_port="population" to_port="result 2"/></div><div> <connect from_op="Automatic Feature Engineering" from_port="optimization log" to_port="result 3"/></div><div> <portSpacing port="source_input 1" spacing="0"/></div><div> <portSpacing port="sink_result 1" spacing="0"/></div><div> <portSpacing port="sink_result 2" spacing="0"/></div><div> <portSpacing port="sink_result 3" spacing="0"/></div><div> <portSpacing port="sink_result 4" spacing="0"/></div><div> </process></div><div> </operator></div><div></process></div>
1
Answers
-
Hi,I think you need to create your own metric with performance to Data, Generate Attributes and Extract performance where you generate 1-Recall. By definition the operator wants to minimize it's performance metric.
Right @IngoRM ?Cheers,
Martin5 -
Hi Andy, Martin,
Yes, that's right. The operator minimizes the performance criterion which works universally well for error rates across classification and regression problems but leads to this behavior. As Martin has suggested, you can use a workaround to make this work with other measurements, too. I have added a small example process below.
Hope this helps,
Ingo<div><?xml version="1.0" encoding="UTF-8"?><process version="9.7.000-SNAPSHOT"></div><div> <context></div><div> <input/></div><div> <output/></div><div> <macros/></div><div> </context></div><div> <operator activated="true" class="process" compatibility="9.7.000-SNAPSHOT" expanded="true" name="Process"></div><div> <parameter key="logverbosity" value="init"/></div><div> <parameter key="random_seed" value="2001"/></div><div> <parameter key="send_mail" value="never"/></div><div> <parameter key="notification_email" value=""/></div><div> <parameter key="process_duration_for_mail" value="30"/></div><div> <parameter key="encoding" value="UTF-8"/></div><div> <process expanded="true"></div><div> <operator activated="true" class="retrieve" compatibility="9.7.000-SNAPSHOT" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="45" y="34"></div><div> <parameter key="repository_entry" value="//Samples/data/Titanic Training"/></div><div> </operator></div><div> <operator activated="true" class="model_simulator:automatic_feature_engineering" compatibility="9.7.000-SNAPSHOT" expanded="true" height="103" name="Automatic Feature Engineering" width="90" x="179" y="34"></div><div> <parameter key="mode" value="feature selection"/></div><div> <parameter key="balance for accuracy" value="1.0"/></div><div> <parameter key="show progress dialog" value="false"/></div><div> <parameter key="use_local_random_seed" value="false"/></div><div> <parameter key="local_random_seed" value="1992"/></div><div> <parameter key="use optimization heuristics" value="true"/></div><div> <parameter key="maximum generations" value="30"/></div><div> <parameter key="population size" value="10"/></div><div> <parameter key="use multi-starts" value="true"/></div><div> <parameter key="number of multi-starts" value="5"/></div><div> <parameter key="generations until multi-start" value="10"/></div><div> <parameter key="use time limit" value="false"/></div><div> <parameter key="time limit in seconds" value="60"/></div><div> <parameter key="use subset for generation" value="false"/></div><div> <parameter key="maximum function complexity" value="10"/></div><div> <parameter key="use_plus" value="false"/></div><div> <parameter key="use_diff" value="false"/></div><div> <parameter key="use_mult" value="true"/></div><div> <parameter key="use_div" value="true"/></div><div> <parameter key="reciprocal_value" value="true"/></div><div> <parameter key="use_square_roots" value="false"/></div><div> <parameter key="use_exp" value="false"/></div><div> <parameter key="use_log" value="false"/></div><div> <parameter key="use_absolute_values" value="false"/></div><div> <parameter key="use_sgn" value="false"/></div><div> <parameter key="use_min" value="false"/></div><div> <parameter key="use_max" value="false"/></div><div> <process expanded="true"></div><div> <operator activated="true" class="split_validation" compatibility="9.7.000-SNAPSHOT" expanded="true" height="124" name="Validation" width="90" x="45" y="34"></div><div> <parameter key="create_complete_model" value="false"/></div><div> <parameter key="split" value="relative"/></div><div> <parameter key="split_ratio" value="0.7"/></div><div> <parameter key="training_set_size" value="100"/></div><div> <parameter key="test_set_size" value="-1"/></div><div> <parameter key="sampling_type" value="automatic"/></div><div> <parameter key="use_local_random_seed" value="true"/></div><div> <parameter key="local_random_seed" value="1992"/></div><div> <process expanded="true"></div><div> <operator activated="true" class="naive_bayes" compatibility="9.7.000-SNAPSHOT" expanded="true" height="82" name="Naive Bayes" width="90" x="45" y="34"></div><div> <parameter key="laplace_correction" value="true"/></div><div> </operator></div><div> <connect from_port="training" to_op="Naive Bayes" to_port="training set"/></div><div> <connect from_op="Naive Bayes" from_port="model" to_port="model"/></div><div> <portSpacing port="source_training" spacing="0"/></div><div> <portSpacing port="sink_model" spacing="0"/></div><div> <portSpacing port="sink_through 1" spacing="0"/></div><div> </process></div><div> <process expanded="true"></div><div> <operator activated="true" class="apply_model" compatibility="9.7.000-SNAPSHOT" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34"></div><div> <list key="application_parameters"/></div><div> <parameter key="create_view" value="false"/></div><div> </operator></div><div> <operator activated="true" class="performance_binominal_classification" compatibility="9.7.000-SNAPSHOT" expanded="true" height="82" name="Performance" width="90" x="179" y="34"></div><div> <parameter key="manually_set_positive_class" value="false"/></div><div> <parameter key="main_criterion" value="first"/></div><div> <parameter key="accuracy" value="false"/></div><div> <parameter key="classification_error" value="false"/></div><div> <parameter key="kappa" value="false"/></div><div> <parameter key="AUC (optimistic)" value="false"/></div><div> <parameter key="AUC" value="false"/></div><div> <parameter key="AUC (pessimistic)" value="false"/></div><div> <parameter key="precision" value="false"/></div><div> <parameter key="recall" value="true"/></div><div> <parameter key="lift" value="false"/></div><div> <parameter key="fallout" value="false"/></div><div> <parameter key="f_measure" value="false"/></div><div> <parameter key="false_positive" value="false"/></div><div> <parameter key="false_negative" value="false"/></div><div> <parameter key="true_positive" value="false"/></div><div> <parameter key="true_negative" value="false"/></div><div> <parameter key="sensitivity" value="false"/></div><div> <parameter key="specificity" value="false"/></div><div> <parameter key="youden" value="false"/></div><div> <parameter key="positive_predictive_value" value="false"/></div><div> <parameter key="negative_predictive_value" value="false"/></div><div> <parameter key="psep" value="false"/></div><div> <parameter key="skip_undefined_labels" value="true"/></div><div> <parameter key="use_example_weights" value="true"/></div><div> </operator></div><div> <operator activated="true" class="performance_to_data" compatibility="9.7.000-SNAPSHOT" expanded="true" height="82" name="Performance to Data" width="90" x="45" y="238"/></div><div> <operator activated="true" class="generate_attributes" compatibility="9.7.000-SNAPSHOT" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="238"></div><div> <list key="function_descriptions"></div><div> <parameter key="Value" value="-1*[Value]"/></div><div> </list></div><div> <parameter key="keep_all" value="true"/></div><div> </operator></div><div> <operator activated="true" class="extract_performance" compatibility="9.7.000-SNAPSHOT" expanded="true" height="82" name="Performance (2)" width="90" x="313" y="238"></div><div> <parameter key="performance_type" value="data_value"/></div><div> <parameter key="statistics" value="average"/></div><div> <parameter key="attribute_name" value="Value"/></div><div> <parameter key="example_index" value="1"/></div><div> <parameter key="optimization_direction" value="minimize"/></div><div> </operator></div><div> <connect from_port="model" to_op="Apply Model" to_port="model"/></div><div> <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/></div><div> <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/></div><div> <connect from_op="Performance" from_port="performance" to_op="Performance to Data" to_port="performance vector"/></div><div> <connect from_op="Performance to Data" from_port="example set" to_op="Generate Attributes" to_port="example set input"/></div><div> <connect from_op="Generate Attributes" from_port="example set output" to_op="Performance (2)" to_port="example set"/></div><div> <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/></div><div> <portSpacing port="source_model" spacing="0"/></div><div> <portSpacing port="source_test set" spacing="0"/></div><div> <portSpacing port="source_through 1" spacing="0"/></div><div> <portSpacing port="sink_averagable 1" spacing="0"/></div><div> <portSpacing port="sink_averagable 2" spacing="0"/></div><div> </process></div><div> </operator></div><div> <connect from_port="example set source" to_op="Validation" to_port="training"/></div><div> <connect from_op="Validation" from_port="averagable 1" to_port="performance sink"/></div><div> <portSpacing port="source_example set source" spacing="0"/></div><div> <portSpacing port="sink_performance sink" spacing="0"/></div><div> </process></div><div> </operator></div><div> <connect from_op="Retrieve Titanic Training" from_port="output" to_op="Automatic Feature Engineering" to_port="example set in"/></div><div> <connect from_op="Automatic Feature Engineering" from_port="feature set" to_port="result 1"/></div><div> <connect from_op="Automatic Feature Engineering" from_port="population" to_port="result 2"/></div><div> <connect from_op="Automatic Feature Engineering" from_port="optimization log" to_port="result 3"/></div><div> <portSpacing port="source_input 1" spacing="0"/></div><div> <portSpacing port="sink_result 1" spacing="0"/></div><div> <portSpacing port="sink_result 2" spacing="0"/></div><div> <portSpacing port="sink_result 3" spacing="0"/></div><div> <portSpacing port="sink_result 4" spacing="0"/></div><div> </process></div><div> </operator></div><div></process></div>
1 -
Thanks for the help, Martin and Ingo! Then I will get some sleep this night.
0