"Problem with decision tree algorithm"

szymek
szymek New Altair Community Member
edited November 5 in Community Q&A
hi,

I tried to run the Decision tree algorith in Raipd Miner and it seems not to provide a correct result. I am not sure if the problem is caused by the implementation of the algorith or there is another reason for that. Below is the exercise that I tried to run with RM.

I use the following data (A and B are nominal, binary attributes and there are two classes: + and-):
A,B,Class
T,F,+
T,T,+
T,T,+
T,F,-
T,T,+
F,F,-
F,F,-
F,F,-
T,T,-
T,F,-

I want to build a decision tree using Ginin index as the criterion for splitting. Rapid Miner selects attribute A as the best one for splitting. However, if I make calculations manually, B seems to be better. Do you know where is the difference from? Below are my calculations:
The overall gini before splitting is:
Gorig = 1− 0.42 − 0.62 = 0.48

The gain in gini after splitting on A is:
GA=T = 1−(4/7)2 −(3/7)2 = 0.4898
GA=F = 0
Δ = Gorig − 7/10 GA=T − 3/10 GA=F = 0.1371

The gain in gini after splitting on B is:
GB=T = 1−(1/4)2− (3/4)2 = 0.3750
GB=F= 1 - (1/6)2 − (5/6)2 = 0.2778
Δ = Gorig − 4/10 GB=T − 6/10 GB=F = 0.1633

Therefore, attribute B should be chosen to split the node (and not A as calculated by RM).

regards,
Szymon

Answers

  • land
    land New Altair Community Member
    Hi,
    thank you for this hint. We will check that, but might take some time.

    Greetings,
      Sebastian