🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

No decision tree created with parameter criterion to "gini_index"

User: "lionelderkrikor"
New Altair Community Member
Updated by Jocelyn

Good morning,

 

I used the "Decision Tree" operator to create a model with a training dataset.

With parameter "criterion" to "gini_index" no decision tree is created on the results : The differents attributes are not taken into account.

When the parameter "criterion " is "accuracy", or "gain-ratio" or "information_gain", the decision trees are good created.

 

My training dataset and scoreset are in attached files

 

Here my process in xml : 

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Training" width="90" x="112" y="34">
<parameter key="repository_entry" value="//DataMiningForTheMasses/data/Chapter10DataSet_Training"/>
</operator>
<operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role" width="90" x="246" y="34">
<parameter key="attribute_name" value="User_ID"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role (3)" width="90" x="380" y="34">
<parameter key="attribute_name" value="eReader_Adoption"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="concurrency:parallel_decision_tree" compatibility="7.6.001" expanded="true" height="82" name="Decision Tree" width="90" x="514" y="34">
<parameter key="criterion" value="gini_index"/>
<parameter key="maximal_depth" value="20"/>
<parameter key="apply_pruning" value="true"/>
<parameter key="confidence" value="0.25"/>
<parameter key="apply_prepruning" value="true"/>
<parameter key="minimal_gain" value="0.1"/>
<parameter key="minimal_leaf_size" value="2"/>
<parameter key="minimal_size_for_split" value="4"/>
<parameter key="number_of_prepruning_alternatives" value="3"/>
</operator>
<operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Scoring" width="90" x="112" y="238">
<parameter key="repository_entry" value="//DataMiningForTheMasses/data/Chapter10DataSet_Scoring"/>
</operator>
<operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role (2)" width="90" x="313" y="238">
<parameter key="attribute_name" value="User_ID"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="apply_model" compatibility="7.6.001" expanded="true" height="82" name="Apply Model" width="90" x="715" y="136">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
<connect from_op="Training" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Set Role (3)" to_port="example set input"/>
<connect from_op="Set Role (3)" from_port="example set output" to_op="Decision Tree" to_port="training set"/>
<connect from_op="Decision Tree" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Scoring" from_port="output" to_op="Set Role (2)" to_port="example set input"/>
<connect from_op="Set Role (2)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
<connect from_op="Apply Model" from_port="model" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>

Is it a bug ?

 

Can you help me ?

 

Thank you

 

Lionel

 

 

 

Find more posts tagged with

Sort by:
1 - 2 of 21
    User: "earmijo"
    New Altair Community Member
    Accepted Answer

    Try unchecking the setting Apply Pre-pruning 

     

    Screen Shot 2017-11-12 at 3.25.20 PM.png

    User: "earmijo"
    New Altair Community Member
    Accepted Answer

    Let me add a couple of sentences to Thomas_Ott's answer. I was confused myself when I started using RapidMiner. 

     

    You can find a nice and clear explanation of both pruning and pre-pruning here:

     

    Machine Learning: Pruning Decision Trees

     

    You should experiment in your process with all the variations. 

     

    Pre-pruning (early stopping): You stop splitting if no significant benefit results from an additional split.

    Pruning (post-pruning): You keep splitting until you reach the desired number of levels (depth = the main measure of complexity of the tree) but you try to simplify the tree afterwards. 

    Neither Pre-pruning nor Pruning : Try it. The tree will grow symmetrically until reaching the desired number of levels (depth).

     

    IF processing time is not an issue, there is no reason to ever use the pre-pruning option. In .the worst case, you'll end up with the same performance metric, but there is a chance (real as your example illustrates) that you'll end up doing better with (post-pruning).