"Compare Weka and RM Random Forest"

Question

Hi, During my last calculations I did from runs with Random Forest and the Weka Random Forest Operator using various options. Although I tried to make both operators equivalent (using the same number of trees, same local random seed, no minimal split features etc.) the perfomance of the two operators were still different. Here is the workflow I used for benchmarking the two operators. Is RM using a different implentation of the Random Forest and if so, what differences were useed? Best regards, Markus

MuehliMan · Answer

I am not interested who wins or looses, but in the reason why the results are different. This was a benchmarking example more than a real-use problem. 
As far as I know Breiman included a bootstrapping as validation for the random forest to avoid overfitting.

BTW: Enabling or disabling the two pruning options do not change the results.

Nice weekend and a happy new year to you too!

Cheers, 
Markus

haddock · Answer

Hi folks,

Mmm... if a model gets 100% accuracy when applied to its own training data, especially when that data is random, then one should suspect over-fitting, and low predictive power - so RapidMiner wins again  ;)  . Models are only good if they work equally well on unseen data, as this link explains..

http://en.wikipedia.org/wiki/Overfitting

Good weekend to all!

MuehliMan · Answer

Hi Sebastian,

if that is the case, why do I totally different values for the performance, given by accuary for example).

Wekas accuracy: 100%
RMs accuracy: 61%

(obtained with the posted workflow)

Best, 
Markus