Decision Tree Pruning
I have two questions related to "pruning" (post-pruning) in the Decision Tree operator.
1. RM supports pessimistic pruning (i.e., top-down), but not optimistic pruning (i.e., bottom-up). Is that correct?
2. What are the precise logical steps in the pruning process?
3. When the Decision Tree is being trained using the "training set" with the pruning option enabled, which "validation set" is the classification error computed for? It cannot be the entire training set because then the classification error would be 0 in the fully-grown tree, which would always be the minimum. My understanding of pruning is that the cost complexity is computed by applying a penalty factor for tree size and the tree that minimizes the classification error for the validation set is chosen. When using the training set, how is the validation done?
I get related ideas from this previous post and another previous post. Also, I have looked at the RapidMiner code PessimisticPruner.java, but I am not able to parse the logic from there.
@IngoRM, @land, and others - any help would be much appreciated.
1. RM supports pessimistic pruning (i.e., top-down), but not optimistic pruning (i.e., bottom-up). Is that correct?
2. What are the precise logical steps in the pruning process?
3. When the Decision Tree is being trained using the "training set" with the pruning option enabled, which "validation set" is the classification error computed for? It cannot be the entire training set because then the classification error would be 0 in the fully-grown tree, which would always be the minimum. My understanding of pruning is that the cost complexity is computed by applying a penalty factor for tree size and the tree that minimizes the classification error for the validation set is chosen. When using the training set, how is the validation done?
I get related ideas from this previous post and another previous post. Also, I have looked at the RapidMiner code PessimisticPruner.java, but I am not able to parse the logic from there.
@IngoRM, @land, and others - any help would be much appreciated.
Find more posts tagged with
Sort by:
1 - 5 of
51
Hi @avd ,
I would not say it does nothing. Have a look at the following process. Adding confidence pruning reduces the number of nodes in the lower trees.
Best,
Martin
<?xml version="1.0" encoding="UTF-8"?><process version="9.9.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.9.000" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2018"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="9.9.000" expanded="true" height="68" name="Retrieve Sonar" width="90" x="313" y="187">
<parameter key="repository_entry" value="//Samples/data/Sonar"/>
</operator>
<operator activated="true" class="multiply" compatibility="9.9.000" expanded="true" height="103" name="Multiply" width="90" x="447" y="187"/>
<operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.9.000" expanded="true" height="103" name="Decision Tree" width="90" x="581" y="85">
<parameter key="criterion" value="gini_index"/>
<parameter key="maximal_depth" value="100"/>
<parameter key="apply_pruning" value="true"/>
<parameter key="confidence" value="0.01"/>
<parameter key="apply_prepruning" value="false"/>
<parameter key="minimal_gain" value="0.01"/>
<parameter key="minimal_leaf_size" value="2"/>
<parameter key="minimal_size_for_split" value="20"/>
<parameter key="number_of_prepruning_alternatives" value="3"/>
</operator>
<operator activated="true" class="converters:dectree_2_example_set" compatibility="0.9.000" expanded="true" height="82" name="Decision Tree to ExampleSet" width="90" x="715" y="85"/>
<operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.9.000" expanded="true" height="103" name="Decision Tree (2)" width="90" x="581" y="238">
<parameter key="criterion" value="gini_index"/>
<parameter key="maximal_depth" value="100"/>
<parameter key="apply_pruning" value="true"/>
<parameter key="confidence" value="0.5"/>
<parameter key="apply_prepruning" value="false"/>
<parameter key="minimal_gain" value="0.01"/>
<parameter key="minimal_leaf_size" value="2"/>
<parameter key="minimal_size_for_split" value="20"/>
<parameter key="number_of_prepruning_alternatives" value="3"/>
</operator>
<operator activated="true" class="converters:dectree_2_example_set" compatibility="0.9.000" expanded="true" height="82" name="Decision Tree to ExampleSet (2)" width="90" x="715" y="238"/>
<connect from_op="Retrieve Sonar" from_port="output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Decision Tree" to_port="training set"/>
<connect from_op="Multiply" from_port="output 2" to_op="Decision Tree (2)" to_port="training set"/>
<connect from_op="Decision Tree" from_port="model" to_op="Decision Tree to ExampleSet" to_port="tree"/>
<connect from_op="Decision Tree to ExampleSet" from_port="exa" to_port="result 1"/>
<connect from_op="Decision Tree (2)" from_port="model" to_op="Decision Tree to ExampleSet (2)" to_port="tree"/>
<connect from_op="Decision Tree to ExampleSet (2)" from_port="exa" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="42"/>
<portSpacing port="sink_result 2" spacing="42"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.9.000" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2018"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="9.9.000" expanded="true" height="68" name="Retrieve Sonar" width="90" x="313" y="187">
<parameter key="repository_entry" value="//Samples/data/Sonar"/>
</operator>
<operator activated="true" class="multiply" compatibility="9.9.000" expanded="true" height="103" name="Multiply" width="90" x="447" y="187"/>
<operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.9.000" expanded="true" height="103" name="Decision Tree" width="90" x="581" y="85">
<parameter key="criterion" value="gini_index"/>
<parameter key="maximal_depth" value="100"/>
<parameter key="apply_pruning" value="true"/>
<parameter key="confidence" value="0.01"/>
<parameter key="apply_prepruning" value="false"/>
<parameter key="minimal_gain" value="0.01"/>
<parameter key="minimal_leaf_size" value="2"/>
<parameter key="minimal_size_for_split" value="20"/>
<parameter key="number_of_prepruning_alternatives" value="3"/>
</operator>
<operator activated="true" class="converters:dectree_2_example_set" compatibility="0.9.000" expanded="true" height="82" name="Decision Tree to ExampleSet" width="90" x="715" y="85"/>
<operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.9.000" expanded="true" height="103" name="Decision Tree (2)" width="90" x="581" y="238">
<parameter key="criterion" value="gini_index"/>
<parameter key="maximal_depth" value="100"/>
<parameter key="apply_pruning" value="true"/>
<parameter key="confidence" value="0.5"/>
<parameter key="apply_prepruning" value="false"/>
<parameter key="minimal_gain" value="0.01"/>
<parameter key="minimal_leaf_size" value="2"/>
<parameter key="minimal_size_for_split" value="20"/>
<parameter key="number_of_prepruning_alternatives" value="3"/>
</operator>
<operator activated="true" class="converters:dectree_2_example_set" compatibility="0.9.000" expanded="true" height="82" name="Decision Tree to ExampleSet (2)" width="90" x="715" y="238"/>
<connect from_op="Retrieve Sonar" from_port="output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Decision Tree" to_port="training set"/>
<connect from_op="Multiply" from_port="output 2" to_op="Decision Tree (2)" to_port="training set"/>
<connect from_op="Decision Tree" from_port="model" to_op="Decision Tree to ExampleSet" to_port="tree"/>
<connect from_op="Decision Tree to ExampleSet" from_port="exa" to_port="result 1"/>
<connect from_op="Decision Tree (2)" from_port="model" to_op="Decision Tree to ExampleSet (2)" to_port="tree"/>
<connect from_op="Decision Tree to ExampleSet (2)" from_port="exa" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="42"/>
<portSpacing port="sink_result 2" spacing="42"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Hey folks,
Agreed, it definitely is doing something for other data sets. However, I also found it a bit weird that there does not seem to be any pruning impact for Titanic no matter what the confidence value is... I will create a ticket in our internal system to inspect this a bit closer. Most likely things are fine, but let's double check...
Cheers,
Ingo
Agreed, it definitely is doing something for other data sets. However, I also found it a bit weird that there does not seem to be any pruning impact for Titanic no matter what the confidence value is... I will create a ticket in our internal system to inspect this a bit closer. Most likely things are fine, but let's double check...
Cheers,
Ingo
Thank you for looking into this. Look forward to the update.
I did test it with a few other datasets with the same result.
Some conceptual questions that would help answering -
(a) Since the Decision Tree operator is based on CART - does it use Cost-Complexity pruning approach during the pessimistic pruning? Some related links: link 1, and scikit-learn's Python implementation.
(b) How is the validation (internal) happening during the pruning? What portion of "training" data is internally used for validation and how?
(c) If the CC-pruning approach is used, can the chosen penalty (alpha) parameter be informed to the user for explainability?
(d) If the CC-pruning approach is not used, what exact method is used?
I did test it with a few other datasets with the same result.
Some conceptual questions that would help answering -
(a) Since the Decision Tree operator is based on CART - does it use Cost-Complexity pruning approach during the pessimistic pruning? Some related links: link 1, and scikit-learn's Python implementation.
(b) How is the validation (internal) happening during the pruning? What portion of "training" data is internally used for validation and how?
(c) If the CC-pruning approach is used, can the chosen penalty (alpha) parameter be informed to the user for explainability?
(d) If the CC-pruning approach is not used, what exact method is used?
While I am waiting for the pruning functionality to be double-checked by the kind RM folks in this community, upon some more digging, it seems to me that the Decision Tree operator uses “pessimistic error pruning” method (see section 2.2.5 in the linked article), not the "cost-complexity method". Can someone confirm this? In that case, it makes sense that there is no validation partition used for pruning. Are there any plans to incorporate "cost-complexity method" for pruning which is much more popular (probably because it gives more reliable results)?
Also, where does that leave with my questions in the previous post? I would appreciate the community's help.