Hi,
I have a small example set of data which was provided to us in a study assignment. It's an excel list with two sheets. The first sheet is the training set. It has 25 instances of animals, these are the columns: class , animal, respiration, reproduction, habitat, body hair, limbs front, limbs back, mammal (yes/no).
Out of these, the following are actual attributes: respiration, reproduction, habitat, body hair, limbs front, limbs back.
I use these in a chain or operators to process them with the ID3 operator. The ID3 operator is set to: criterion (information gain), minimal size for split (4), minimal leaf size (2)
The second sheet is equivalent to the first sheet, but only contains 5 instances of animals.
When I now vary the minimal gain setting of the ID3 operator, counterintuitively the tree is getting mor complex, the higher the minimal gain is set.
The tree is most simple, when minimal gain is set to 0.1, more complex when set to 1.0 and the most complex when set to 10.0. How is this possible?
Excuse my English!