Decision tree
I'm trying to use a decision tree to predict user will leave.
My data include 4 regular attributes (2 nominal, 2 integer), and 1 special attribute (nominal label).
When using the Decision Tree operator I don't get a tree with all data, only one of the regular appear (as root) and the leafs contains the label data (which is OK).
What am I doing wrong?
Answers
-
Hello, this may be simply happening because the data does not have patterns that fit the criteria you set.
I will suggest trying values for pruning, prepruning and confidence values.
A better way to find a right value for these would be using the "Optimize Parameters (Grid) operator and giving it a range to try combinations of some of these variables that affect your model.
You should be able to see a sample process in the help for "Optimize Parameters(Grid)" to see how this operator works
Good Luck
1 -
Followup question -
First of all, thank you for your answer.
I created a table with patterns (manually), first to check i'm doing it right.
Is there a way to know who is located in each leaf?
I would like to learn which users will have a specific value (the labell value) in the future.
Bests.
0 -
Hi,
what you can do is use the tree to rules operator. As a result (see attached process) you get the paths as strings. That might be helpful in first place. There is no one operator solution to apply this rules to a dataset to get "leaf IDs" but it might be possible to find some working process with things like Write as Text and then parse the resulting text files.
Best,
Martin
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="7.1.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.1.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.1.001" expanded="true" height="68" name="Retrieve Golf" width="90" x="112" y="85">
<parameter key="repository_entry" value="//Samples/data/Golf"/>
</operator>
<operator activated="true" class="tree_to_rules" compatibility="7.1.001" expanded="true" height="82" name="Tree to Rules" width="90" x="246" y="85">
<process expanded="true">
<operator activated="true" class="parallel_decision_tree" compatibility="7.1.001" expanded="true" height="82" name="Decision Tree" width="90" x="45" y="34"/>
<connect from_port="training set" to_op="Decision Tree" to_port="training set"/>
<connect from_op="Decision Tree" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
</process>
</operator>
<operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply Model" width="90" x="380" y="85">
<list key="application_parameters"/>
</operator>
<connect from_op="Retrieve Golf" from_port="output" to_op="Tree to Rules" to_port="training set"/>
<connect from_op="Tree to Rules" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Tree to Rules" from_port="example set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="model" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>0