"Decision tree with one node in spite of low confidence and min gain"
Helo,
I have a problem with my decision tree. It generated only one node. Then I started to minimize the confidence even to 0.1 and min gain to 0.001. However, it didn't help. Could you please tell me what to do?
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.1.003" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="8.1.003" expanded="true" height="68" name="Retrieve Books_Ratings_Tags_forUser10" width="90" x="112" y="85">
<parameter key="repository_entry" value="Books_Ratings_Tags_forUser10"/>
</operator>
<operator activated="true" class="split_data" compatibility="8.1.003" expanded="true" height="103" name="Split Data" width="90" x="246" y="85">
<enumeration key="partitions">
<parameter key="ratio" value="0.8"/>
<parameter key="ratio" value="0.2"/>
</enumeration>
</operator>
<operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.1.003" expanded="true" height="103" name="Decision Tree" width="90" x="447" y="34">
<parameter key="confidence" value="0.1"/>
<parameter key="minimal_gain" value="0.001"/>
</operator>
<operator activated="true" class="apply_model" compatibility="8.1.003" expanded="true" height="82" name="Apply Model" width="90" x="581" y="136">
<list key="application_parameters"/>
</operator>
<connect from_op="Retrieve Books_Ratings_Tags_forUser10" from_port="output" to_op="Split Data" to_port="example set"/>
<connect from_op="Split Data" from_port="partition 1" to_op="Decision Tree" to_port="training set"/>
<connect from_op="Split Data" from_port="partition 2" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Decision Tree" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Apply Model" from_port="model" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Best wishes
Olga
Best Answer
-
Hello @olgakulesza2,
I loaded your example, but don't have your data. However, I noticed that you have selected both "apply pruning" and "apply prepruning" on the parameters. You might want to adjust these settings, as these effectively reduce the amount of leaves generated in the tree.
What helps me adjusting a tree with "some" brute force: count how many columns are on the dataset and adjust the maximal depth to the amount of columns + 1. If this does not satisfy your needs, begin playing with the prepruning parameters before pruning right away. Do it adjusting the amount of leaves and divisions, and rerunning the model until you are OK with your results. A piece of advice on top of this is that you might find that Cross-Validation and Optimize Parameters used together can help creating a tree that is good enough for your data.
All the best,
Rodrigo.
1
Answers
-
Hello @olgakulesza2,
I loaded your example, but don't have your data. However, I noticed that you have selected both "apply pruning" and "apply prepruning" on the parameters. You might want to adjust these settings, as these effectively reduce the amount of leaves generated in the tree.
What helps me adjusting a tree with "some" brute force: count how many columns are on the dataset and adjust the maximal depth to the amount of columns + 1. If this does not satisfy your needs, begin playing with the prepruning parameters before pruning right away. Do it adjusting the amount of leaves and divisions, and rerunning the model until you are OK with your results. A piece of advice on top of this is that you might find that Cross-Validation and Optimize Parameters used together can help creating a tree that is good enough for your data.
All the best,
Rodrigo.
1 -
2