"Decision Trees RM 4 vs RM 5"
ammargh
New Altair Community Member
Hi,
I have a model in RM 4.0 that uses decision trees. I have got the best result by setting no_pre_pruning to true (according to a 10 folds cross validation.)
However, implementing the same model in RM 5.0 shows that the accuracy is reduced by around 50%.
By setting no_pre_pruning to false provides similar results in both versions.
Am I missing anything?
Thanks in advanced
I have a model in RM 4.0 that uses decision trees. I have got the best result by setting no_pre_pruning to true (according to a 10 folds cross validation.)
However, implementing the same model in RM 5.0 shows that the accuracy is reduced by around 50%.
By setting no_pre_pruning to false provides similar results in both versions.
Am I missing anything?
Thanks in advanced
Tagged:
0
Answers
-
Hi,
umpf. I can't really say anything about this. Sorry. No idea at all. Did anybody make similar experiences?
Greetings,
Sebastian0 -
Hi ammargh,
do you have a sample data set and RapidMiner data mining process for us to reproduce the results?
Best regards,
Ralf0 -
Hi Ralf,
I have the required data how can I send it to you?0 -
Hi ammargh,
if neither the data nor the process are confidential, you could post the process here. Simply use the insert code button (#) in the forum editor to insert the XML source of the RapidMiner process. Regarding the data set: If it is small, you could also use insert to post it here. This way the community could benefit from the discussion of this issue
If the data set is large or confidential, you can send it to us via e-mail.
Best regards,
Ralf
0 -
Thank you
Below is the code and data.
RM4 Code
RM5Code
<?xml version="1.0" encoding="UTF-8"?>
<process version="4.6">
<operator name="Root" class="Process" expanded="yes">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="excel_file" value="data.xls"/>
<parameter key="sheet_number" value="1"/>
<parameter key="row_offset" value="0"/>
<parameter key="column_offset" value="0"/>
<parameter key="first_row_as_names" value="true"/>
<parameter key="create_label" value="true"/>
<parameter key="label_column" value="27"/>
<parameter key="create_id" value="false"/>
<parameter key="id_column" value="1"/>
<parameter key="decimal_point_character" value="."/>
<parameter key="datamanagement" value="double_array"/>
</operator>
<operator name="XValidation" class="XValidation" expanded="yes">
<parameter key="keep_example_set" value="false"/>
<parameter key="create_complete_model" value="true"/>
<parameter key="average_performances_only" value="true"/>
<parameter key="leave_one_out" value="false"/>
<parameter key="number_of_validations" value="10"/>
<parameter key="sampling_type" value="stratified sampling"/>
<parameter key="local_random_seed" value="-1"/>
<operator name="DecisionTree" class="DecisionTree">
<parameter key="keep_example_set" value="false"/>
<parameter key="criterion" value="information_gain"/>
<parameter key="minimal_size_for_split" value="4"/>
<parameter key="minimal_leaf_size" value="2"/>
<parameter key="minimal_gain" value="0.1"/>
<parameter key="maximal_depth" value="20"/>
<parameter key="confidence" value="0.25"/>
<parameter key="number_of_prepruning_alternatives" value="3"/>
<parameter key="no_pre_pruning" value="true"/>
<parameter key="no_pruning" value="false"/>
</operator>
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="ModelApplier" class="ModelApplier">
<parameter key="keep_model" value="false"/>
<list key="application_parameters">
</list>
<parameter key="create_view" value="false"/>
</operator>
<operator name="Performance" class="Performance">
<parameter key="keep_example_set" value="false"/>
<parameter key="use_example_weights" value="true"/>
</operator>
</operator>
</operator>
</operator>
</process>
and the data
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
<location/>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<parameter key="logverbosity" value="3"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="1"/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="parallelize_main_process" value="false"/>
<process expanded="true" height="505" width="681">
<operator activated="true" class="read_excel" expanded="true" height="60" name="Read Excel" width="90" x="45" y="120">
<parameter key="excel_file" value="data.xls"/>
<parameter key="sheet_number" value="1"/>
<parameter key="row_offset" value="0"/>
<parameter key="column_offset" value="0"/>
<parameter key="first_row_as_names" value="true"/>
</operator>
<operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="179" y="30">
<parameter key="name" value="Label"/>
<parameter key="target_role" value="label"/>
</operator>
<operator activated="true" class="x_validation" expanded="true" height="112" name="Validation" width="90" x="581" y="165">
<parameter key="create_complete_model" value="false"/>
<parameter key="average_performances_only" value="true"/>
<parameter key="leave_one_out" value="false"/>
<parameter key="number_of_validations" value="10"/>
<parameter key="sampling_type" value="2"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
<parameter key="parallelize_training" value="false"/>
<parameter key="parallelize_testing" value="false"/>
<process expanded="true" height="741" width="397">
<operator activated="true" class="decision_tree" expanded="true" height="76" name="Decision Tree" width="90" x="112" y="30">
<parameter key="criterion" value="information_gain"/>
<parameter key="minimal_size_for_split" value="4"/>
<parameter key="minimal_leaf_size" value="2"/>
<parameter key="minimal_gain" value="0.1"/>
<parameter key="maximal_depth" value="20"/>
<parameter key="confidence" value="0.25"/>
<parameter key="number_of_prepruning_alternatives" value="3"/>
<parameter key="no_pre_pruning" value="true"/>
<parameter key="no_pruning" value="false"/>
</operator>
<connect from_port="training" to_op="Decision Tree" to_port="training set"/>
<connect from_op="Decision Tree" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true" height="741" width="397">
<operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="77" y="53">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
<operator activated="true" class="performance" expanded="true" height="76" name="Performance" width="90" x="246" y="75">
<parameter key="use_example_weights" value="true"/>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<connect from_op="Read Excel" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="model" to_port="result 1"/>
<connect from_op="Validation" from_port="training" to_port="result 3"/>
<connect from_op="Validation" from_port="averagable 1" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
</process>
</operator>
</process>
Age F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 F16 F17 F18 F19 F20 F21 F22 F23 F24 F25 Label
71 YES YES L1 NO NO NO NO No No No No No No No No No No Yes No No No No No No No C1
60 YES NO L1 NO NO NO NO No No No No No No No No No No No No Yes No No No No No C1
56 YES YES L1 NO NO NO NO No No No No No No No No No No Yes No No No No No No No C1
47 YES NO L1 NO NO NO NO No No No No No No No No No Yes No No No No No No No No C2
58 YES NO L1 NO NO NO NO No No No No No No No No No No No Yes No No No No No No C2
69 YES NO L1 NO NO NO NO No No No No No No No No No No No Yes No No No No No No C2
66 YES NO L1 NO NO NO NO No No No No No No No No No Yes No No No No No No No No C2
52 YES NO L1 NO NO NO NO No No No No No No No No No Yes No No No No No No No No C2
55 YES NO L1 NO NO NO NO No No No No No No No No No No No Yes No No No No No No C2
71 YES NO L1 NO NO NO NO No No No No No No No No No Yes No No No No No No No No C2
72 YES YES L1 NO NO NO NO No No No No No No No No No No Yes Yes No No No No No No C1
50 YES NO L2 NO NO NO NO No No No No No No No No No No No No No Yes No No No No C2
38 YES NO L2 NO NO NO NO No No No No No No No No No No No No No Yes No No No No C2
77 YES NO L3 NO NO NO NO No Yes No No No Yes Yes No No No No No No No No No No No C2
52 YES NO L3 NO NO YES NO Yes Yes No No No No No No No Yes No No No No No No No No C2
69 YES NO L3 NO NO NO NO No Yes No No No No Yes No Yes No No No No No No No No No C1
64 NO NO L3 NO NO NO NO No Yes Yes No No Yes No No No No No No No No No No No No C1
58 NO NO L3 NO NO YES NO Yes No No No No Yes No No No No No No No No No No No No C1
83 NO NO L3 NO NO NO NO Yes Yes No No No Yes No No No No No No No No No No No No C1
69 YES NO L3 NO YES NO YES No No No No No Yes No No No No No No No No No No No No C2
48 YES NO L3 NO NO YES NO No Yes No No No No Yes No Yes Yes No No No No No No No No C2
68 YES NO L3 NO NO NO NO No Yes No No No Yes Yes No No No No No No No No No No No C1
68 YES NO L3 NO NO NO NO No Yes No No No No Yes No No No No No No No No No No No C1
40 NO YES L3 NO NO NO NO No No No No No Yes No Yes No No No No No No No No No No C1
83 YES NO L3 NO NO NO NO No No No No No Yes No No No No No No No No No No No No C1
62 YES NO L3 NO NO NO NO No Yes No No No Yes Yes Yes Yes No No No No No No No No No C2
65 YES NO L3 NO NO YES NO No Yes No No No Yes Yes No No No No No No No No No No No C1
73 NO NO L3 NO NO YES NO No Yes No No No Yes Yes No No No No No No No No No No No C2
68 YES NO L3 NO NO NO NO No Yes No No No Yes No Yes No No No No No No No No No No C2
60 NO YES L3 YES NO NO NO No No No No No Yes No No No No No No No No No No No No C1
52 NO NO L3 NO NO NO NO No Yes No No No No Yes No No No No No No No No No No No C1
62 YES NO L3 YES NO NO NO No Yes No No No Yes No No Yes No No No No No No No No No C2
72 YES NO L3 NO NO NO NO Yes Yes No No No No Yes No Yes No No No No No No No No No C2
82 YES NO L3 NO NO NO NO No Yes No No No No Yes No Yes No No No No No No No No No C2
81 YES NO L3 NO NO NO NO Yes Yes Yes No No Yes Yes No Yes No No No No No No No No No C1
59 NO NO L4 NO NO NO NO No No No No No No No Yes No No No No No No No No No No C1
59 NO NO L4 NO NO YES NO No No No No No No No Yes No No No No No No No No No No C1
62 YES NO L4 NO NO NO NO No No No No No No No Yes No No No No No No No No No No C1
56 YES NO L7 NO NO NO NO No No No No No No No No No No No No No No No No Yes No C2
61 YES NO L2 NO NO NO NO No No No No No No No No No No No No No Yes No No No No C2
43 NO NO L2 NO NO NO NO No No No No No No No No No No Yes No No No No No No No C1
62 YES NO L2 NO NO NO NO No No No No No No No No No Yes No No No No No No No No C2
30 NO NO L2 NO NO NO YES No No No No No No No No No No Yes No No No No No No No C1
69 YES NO L5 NO NO YES NO No No No No No No No No No No Yes No No No No No No No C1
42 NO NO L5 NO NO NO NO No No No No No No No Yes No No No No Yes No No No No No C2
62 NO YES L5 NO NO NO NO No No No No No No No Yes No No No No Yes No No No No No C2
62 YES NO L5 NO NO NO YES No No No No No No No No No No No No No No No No No No C2
55 NO YES L5 NO NO NO NO No No No No No No No No No No Yes No No No No No No No C1
72 NO NO L5 NO NO NO NO No No No No No No No No No No No Yes No No No No No No C2
75 YES YES L5 NO NO NO NO No No No No No No No No No No No No No No No Yes No No C2
76 YES NO L6 NO NO NO NO No No No No No No No No No No No No No No No No No No C2
66 YES NO L6 NO NO NO NO No No No No No No No No No No No No No No Yes No No No C1
76 YES NO L6 NO NO NO NO No No No No No No No No No No No No No Yes No No No No C2
85 YES NO L6 NO NO NO NO No No No No No No No No No No No No No No Yes No No No C2
75 YES NO L6 NO NO NO NO No No No No No No No No No No No No No No No No No No C2
65 NO NO L6 NO NO NO YES No No No No No No No No No Yes No No No No No No No No C2
30 NO NO L6 NO NO NO YES No No No No No No No No No No No No No No No No No No C1
30 NO NO L6 NO NO NO YES No No No No No No No Yes No No No No No No No No No No C1
74 NO NO L6 NO NO NO NO No No No No No No No No No No Yes No No No No No No No C1
69 YES NO L6 NO NO NO NO No No No No Yes No No No No No No No No No No No No No C2
62 YES NO L6 NO NO NO NO No No No No No No No No No No Yes No No No No No No No C1
66 YES NO L6 NO NO NO NO No No No No No No No No No No No No No No No No No Yes C2
63 NO NO L6 NO NO NO YES No No No No No No No No No No No Yes No No No No No No C2
63 NO NO L6 NO NO NO YES No No No No No No No No No No Yes No No No No No No No C1
63 NO NO L6 NO NO NO NO No No No No No No No No No No Yes No No No No No No No C1
60 YES NO L6 NO NO NO YES No No No Yes No No No No No No No No No No No No No No C2
60 NO NO L6 NO NO NO YES No No No No No No No No No No Yes No No No No No No No C1
64 YES NO L6 NO NO NO NO No No No No No No No No No No No No No Yes No No No No C2
49 YES NO L6 NO NO NO NO No No No No No No No No No No No Yes No No No No No No C2
73 YES NO L6 NO NO NO NO No No No No No No No No No No No No No No No No Yes No C2
50 NO NO L6 NO NO NO YES No No No No No No No Yes No No No No No No No No No No C1
82 NO NO L6 NO NO NO NO No No No No No No No Yes No No No No No No No No No No C1
67 YES NO L6 NO NO NO NO No No No No No No No No No No No No No No No No No No C2
52 NO NO L6 NO NO NO NO No No No No No No No Yes No No No No No No No No No No C1
52 YES NO L6 NO NO NO NO No No No Yes No No No No No No No No No No No No No No C2
57 YES NO L6 NO NO NO NO No No No No No No No No No No No Yes No No No No No No C2
53 YES NO L6 NO NO NO NO No No No No No No No Yes No No No No No No No No No No C2
52 YES NO L6 NO NO NO NO No No No No No No No Yes No No No No No No No No No No C1
78 NO NO L6 NO NO NO NO No No No No No No No No No No Yes No No No No No No No C1
69 YES NO L6 NO NO NO YES No No No No No No No No No No Yes No No No No No No No C2
69 NO NO L6 NO NO NO YES No No No No No No No No No No Yes No No No No No No No C1
73 NO NO L6 NO NO NO NO No No No No No No No No No No Yes No No No No No No No C1
59 YES NO L6 NO NO NO NO No No No No No No No No No Yes No No No No No No No No C2
65 NO NO L6 NO NO NO NO No No No No No No No Yes No No No No No No No No No No C2
78 YES NO L6 NO NO NO NO No No No No No No No No No Yes No No No No No No No No C2
76 YES NO L6 NO NO NO NO No No No No No No No No No No No No No No No No Yes No C2
76 YES NO L6 NO NO NO NO No No No No No No No No No No No Yes No No No No No No C2
73 YES NO L6 NO NO NO NO No No No No No No No No No No No No No Yes No No No No C2
79 YES NO L6 NO NO NO NO No No No No No No No No No Yes No No No No No No No No C20