Problems loading and applying model

Timbo · April 2010

Hi

I am trying to read a model from file and then apply it to a new dataset. The Model is quite large (Weka Random Forest with 3500 trees). Whenever I try doing that I get the following error message:

> UserError occured in 1st application of ModelLoader (ModelLoader)
> G Apr 6, 2010 2:26:35 PM: [Fatal] Process failed: Could not read file
> '/home/ruhe/wd/SetC_3500zip.mod': Cannot read from XML stream, wrong
> format: GC overhead limit exceeded

I already tried writing the model using all 3 possible options given in the ModelWriter. Well the model is quite large (about 1GB) but I have it running on a machine with 27GB RAM ion a small set (20000 examples, 14 attributes) so I guess this is not a memory problem.

Timbo

steffen · April 2010

Hello and welcome to rapidminer

Please consider this link: http://rapid-i.com/wiki/index.php?title=Memory_Issues

regards,

steffen

Timbo · April 2010

Hi steffen,

thanks for your reply. Thanks you also for the link although it does not really help. The settings are such that the 27GB RAM are really used. I am pretty sure about that as building that model took about the same amount of memory. If the error message is due to a lack of heap space nontheless I might be in trouble...

Timbo

steffen · April 2010

Hello

ok, it is weird that you are able to save the model but not to load it on the same machine (sorry, didnt notice this detail before, I shouldnt answer posts with a fuzzy mind

).

When I got some time to spare, I'll perform some experiments with the Weka-Random-Forest to reproduce the problem. In the meantime: Could you repeat the experiment using the RandomForest implementation of rapid-i ("Random Forest") and tell us whether it worked or not.

regards,

steffen

Timbo · April 2010

Morning,

I did the same thing using the rapidminer RandomForest using the option "information_gain" as it produced the same results as the Weka RF when tested with smaller numbers of trees. Saving, reading and applying the model went fine. The only problem is that the results produced in such a way are completely weird. All examples in the set are classified as "1" with confidence(1)=1.0. The actual number of examples being "1" is only about 50%. To me this looks a lot like some kind of overtrainig. This is a bit surprising as Leo Breiman states in his paper that Random Forests can not be overtrained at all.

I'll try to perform some more tests and post the results but due to the large number of trees that might take til monday.

Timbo

Timbo · April 2010

Alright, one of my tests just showed that the "overtraining" is probably due to problems with the test file.

Timbo · April 2010

I am now done with my tests. Using other random forests leads to the result that I can save and load them. Nevertheless the performance is not as good as the Weka.RF for the given problem. But unfortunately I cannot read that one from file once its been saved. The strange thing is the the Weka Random Forest CAN be loaded for smaller models. But 27 GB of memory should definitely be enough. Strange and I start running out of ideas.

Problems loading and applying model

Answers

Categories