Optimize Parameter (Grid) Parameter
dome
New Altair Community Member
Hello,
I tried the Optimize Parameter (Grid) Operator in my Process, to find optimal Parameters for a decision tree. The Operator workd fine. My Problem now is, when i use the Parameters in a seperate decision Tree the accuraccy is different from the Optimize Operator.
For example:
The Optimize Parameter Operator detects that with the criterion gini_index and depth of 3 and in the cross validation with the stratified method and 5 folds the accuracy of 80%.
If I use these Parameters to the exact same Process (without Optimize Parameter) with same data and Operators, then the accuracy drops to 60%.
Is there any solution to this problem? Or is that explainable?
Thanks!
Tagged:
0
Best Answer
-
Hello @dome,You are doing nothing wrong. Let me use the Titanic example to show you how to use the best available model. Attached you can find two processes.The training process:...and the Scoring process:This is just an example. If your scoring algorithm doesn't behave well with new, unseen data, you should store your data, see what the reality is and go back and forth until you find yourself comfortable. Splitting your process in training and production, and share the models between both helps recovering time.Hope this helps,Rodrigo.5
Answers
-
Hi @dome,
Can you share your process and your data in order we can reproduce what you observe ?
Regards,
Lionel0 -
Hello,Yes, it might be. It all depends on how does your data look like and how are you testing it. It would be nice to have a copy of your XML process so that we can take a look.A simple, educated guess:With a split validation inside optimize parameters, you can check that the best combination of parameters gives 80% by using, let's say 90% of your data and leaving 10% of your data for testing. If you connect the mod port to a store, you can get the decision tree you need, but running with the best parameters on top of the entire collection, and your results may vary. However, without knowing your data or your process it's hard to know what's happening.All the best,Rod.
1 -
Hello,Unfortunately i cant show you the data, but here is my process.The parameter i read from the results, where the log of all Iterations is listet. After that i put these Parameter into another Process with a decision tree and a cross validation.Thanks!
<?xml version="1.0" encoding="UTF-8"?>
-<process version="9.3.001">
-<context>
<input/>
<output/>
<macros/>
</context>
-<operator name="Process" expanded="true" compatibility="9.3.001" class="process" activated="true">
<parameter value="init" key="logverbosity"/>
<parameter value="2001" key="random_seed"/>
<parameter value="never" key="send_mail"/>
<parameter value="" key="notification_email"/>
<parameter value="30" key="process_duration_for_mail"/>
<parameter value="SYSTEM" key="encoding"/>
-<process expanded="true">
-<operator name="Nominal to Numerical" expanded="true" compatibility="9.3.001" class="nominal_to_numerical" activated="true" y="187" x="179" width="90" height="103">
<parameter value="false" key="return_preprocessing_model"/>
<parameter value="true" key="create_view"/>
<parameter value="subset" key="attribute_filter_type"/>
<parameter value="Geschlecht" key="attribute"/>
<parameter value="Geschlecht|Kader|Phase|Sportart|Gruppe" key="attributes"/>
<parameter value="false" key="use_except_expression"/>
<parameter value="nominal" key="value_type"/>
<parameter value="false" key="use_value_type_exception"/>
<parameter value="file_path" key="except_value_type"/>
<parameter value="single_value" key="block_type"/>
<parameter value="false" key="use_block_type_exception"/>
<parameter value="single_value" key="except_block_type"/>
<parameter value="false" key="invert_selection"/>
<parameter value="false" key="include_special_attributes"/>
<parameter value="dummy coding" key="coding_type"/>
<parameter value="false" key="use_comparison_groups"/>
<list key="comparison_groups"/>
<parameter value="all 0 and warning" key="unexpected_value_handling"/>
<parameter value="false" key="use_underscore_in_name"/>
</operator>
-<operator name="Generate Attributes (2)" expanded="true" compatibility="9.3.001" class="generate_attributes" activated="true" y="187" x="313" width="90" height="82">
-<list key="function_descriptions">
<parameter value="if ((abs ([MmaxExt60re]-[MmaxExt60li])/ max([MmaxExt60li],[MmaxExt60re]))> 0.1, 1, 0)" key="MA60"/>
<parameter value="if ((abs ([MmaxExt180re]-[MmaxExt180li])/ max([MmaxExt180li],[MmaxExt180re]))> 0.1, 1, 0)" key="MA180"/>
<parameter value="if ([MA60] || [MA180], true, false)" key="MA"/>
</list>
<parameter value="true" key="keep_all"/>
</operator>
-<operator name="Select Attributes (3)" expanded="true" compatibility="9.3.001" class="select_attributes" activated="true" y="187" x="447" width="90" height="82">
<parameter value="subset" key="attribute_filter_type"/>
<parameter value="" key="attribute"/>
<parameter value="" key="attributes"/>
<parameter value="false" key="use_except_expression"/>
<parameter value="attribute_value" key="value_type"/>
<parameter value="false" key="use_value_type_exception"/>
<parameter value="time" key="except_value_type"/>
<parameter value="attribute_block" key="block_type"/>
<parameter value="false" key="use_block_type_exception"/>
<parameter value="value_matrix_row_start" key="except_block_type"/>
<parameter value="false" key="invert_selection"/>
<parameter value="false" key="include_special_attributes"/>
<description width="126" colored="false" color="transparent" align="center"/>
</operator>
-<operator name="Set Role" expanded="true" compatibility="9.3.001" class="set_role" activated="true" y="187" x="581" width="90" height="82">
<parameter value="MA" key="attribute_name"/>
<parameter value="label" key="target_role"/>
<list key="set_additional_roles"/>
</operator>
-<operator name="Optimize Parameters (Grid)" expanded="true" compatibility="9.3.001" class="concurrency:optimize_parameters_grid" activated="true" y="187" x="782" width="90" height="124">
-<list key="parameters">
<parameter value="gain_ratio,information_gain,gini_index,accuracy" key="Decision Tree (3).criterion"/>
<parameter value="[-1.0;100.0;100;linear]" key="Decision Tree (3).maximal_depth"/>
<parameter value="[2.0;10;10;linear]" key="Cross Validation (3).number_of_folds"/>
</list>
<parameter value="fail on error" key="error_handling"/>
<parameter value="true" key="log_performance"/>
<parameter value="false" key="log_all_criteria"/>
<parameter value="false" key="synchronize"/>
<parameter value="true" key="enable_parallel_execution"/>
-<process expanded="true">
-<operator name="Cross Validation (3)" expanded="true" compatibility="9.3.001" class="concurrency:cross_validation" activated="true" y="34" x="514" width="90" height="145">
<parameter value="false" key="split_on_batch_attribute"/>
<parameter value="false" key="leave_one_out"/>
<parameter value="10" key="number_of_folds"/>
<parameter value="stratified sampling" key="sampling_type"/>
<parameter value="false" key="use_local_random_seed"/>
<parameter value="1992" key="local_random_seed"/>
<parameter value="true" key="enable_parallel_execution"/>
-<process expanded="true">
-<operator name="Decision Tree (3)" expanded="true" compatibility="9.3.001" class="concurrency:parallel_decision_tree" activated="true" y="34" x="246" width="90" height="103">
<parameter value="accuracy" key="criterion"/>
<parameter value="100" key="maximal_depth"/>
<parameter value="false" key="apply_pruning"/>
<parameter value="0.1" key="confidence"/>
<parameter value="false" key="apply_prepruning"/>
<parameter value="Infinity" key="minimal_gain"/>
<parameter value="2" key="minimal_leaf_size"/>
<parameter value="4" key="minimal_size_for_split"/>
<parameter value="3" key="number_of_prepruning_alternatives"/>
</operator>
<connect to_port="training set" to_op="Decision Tree (3)" from_port="training set"/>
<connect to_port="model" from_port="model" from_op="Decision Tree (3)"/>
<portSpacing spacing="0" port="source_training set"/>
<portSpacing spacing="0" port="sink_model"/>
<portSpacing spacing="0" port="sink_through 1"/>
</process>
-<process expanded="true">
-<operator name="Apply Model (3)" expanded="true" compatibility="9.3.001" class="apply_model" activated="true" y="34" x="112" width="90" height="82">
<list key="application_parameters"/>
<parameter value="false" key="create_view"/>
</operator>
-<operator name="Performance (3)" expanded="true" compatibility="9.3.001" class="performance_classification" activated="true" y="34" x="313" width="90" height="82" origin="GENERATED_TUTORIAL">
<parameter value="first" key="main_criterion"/>
<parameter value="true" key="accuracy"/>
<parameter value="false" key="classification_error"/>
<parameter value="false" key="kappa"/>
<parameter value="false" key="weighted_mean_recall"/>
<parameter value="false" key="weighted_mean_precision"/>
<parameter value="false" key="spearman_rho"/>
<parameter value="false" key="kendall_tau"/>
<parameter value="false" key="absolute_error"/>
<parameter value="false" key="relative_error"/>
<parameter value="false" key="relative_error_lenient"/>
<parameter value="false" key="relative_error_strict"/>
<parameter value="false" key="normalized_absolute_error"/>
<parameter value="false" key="root_mean_squared_error"/>
<parameter value="false" key="root_relative_squared_error"/>
<parameter value="false" key="squared_error"/>
<parameter value="false" key="correlation"/>
<parameter value="false" key="squared_correlation"/>
<parameter value="false" key="cross-entropy"/>
<parameter value="false" key="margin"/>
<parameter value="false" key="soft_margin_loss"/>
<parameter value="false" key="logistic_loss"/>
<parameter value="true" key="skip_undefined_labels"/>
<parameter value="true" key="use_example_weights"/>
<list key="class_weights"/>
</operator>
<connect to_port="model" to_op="Apply Model (3)" from_port="model"/>
<connect to_port="unlabelled data" to_op="Apply Model (3)" from_port="test set"/>
<connect to_port="labelled data" to_op="Performance (3)" from_port="labelled data" from_op="Apply Model (3)"/>
<connect to_port="performance 1" from_port="performance" from_op="Performance (3)"/>
<connect to_port="test set results" from_port="example set" from_op="Performance (3)"/>
<portSpacing spacing="0" port="source_model"/>
<portSpacing spacing="0" port="source_test set"/>
<portSpacing spacing="0" port="source_through 1"/>
<portSpacing spacing="0" port="sink_test set results"/>
<portSpacing spacing="0" port="sink_performance 1"/>
<portSpacing spacing="0" port="sink_performance 2"/>
</process>
</operator>
<connect to_port="example set" to_op="Cross Validation (3)" from_port="input 1"/>
<connect to_port="model" from_port="model" from_op="Cross Validation (3)"/>
<connect to_port="performance" from_port="performance 1" from_op="Cross Validation (3)"/>
<portSpacing spacing="0" port="source_input 1"/>
<portSpacing spacing="0" port="source_input 2"/>
<portSpacing spacing="0" port="sink_performance"/>
<portSpacing spacing="0" port="sink_model"/>
<portSpacing spacing="0" port="sink_output 1"/>
</process>
</operator>
<connect to_port="example set input" to_op="Generate Attributes (2)" from_port="example set output" from_op="Nominal to Numerical"/>
<connect to_port="example set input" to_op="Select Attributes (3)" from_port="example set output" from_op="Generate Attributes (2)"/>
<connect to_port="example set input" to_op="Set Role" from_port="example set output" from_op="Select Attributes (3)"/>
<connect to_port="input 1" to_op="Optimize Parameters (Grid)" from_port="example set output" from_op="Set Role"/>
<connect to_port="result 1" from_port="performance" from_op="Optimize Parameters (Grid)"/>
<connect to_port="result 2" from_port="model" from_op="Optimize Parameters (Grid)"/>
<connect to_port="result 3" from_port="parameter set" from_op="Optimize Parameters (Grid)"/>
<portSpacing spacing="0" port="source_input 1"/>
<portSpacing spacing="0" port="sink_result 1"/>
<portSpacing spacing="0" port="sink_result 2"/>
<portSpacing spacing="0" port="sink_result 3"/>
<portSpacing spacing="0" port="sink_result 4"/>
</process>
</operator>
</process>0 -
Hello @dome,You are doing nothing wrong. Let me use the Titanic example to show you how to use the best available model. Attached you can find two processes.The training process:...and the Scoring process:This is just an example. If your scoring algorithm doesn't behave well with new, unseen data, you should store your data, see what the reality is and go back and forth until you find yourself comfortable. Splitting your process in training and production, and share the models between both helps recovering time.Hope this helps,Rodrigo.5
-
Thank you! This helps a lot!
0