Altair RISE

A program to recognize and reward our most engaged community members

Nominate Yourself Now!

Problems loading and applying model

Hi

I am trying to read a model from file and then apply it to a new dataset. The Model is quite large (Weka Random Forest with 3500 trees). Whenever I try doing that I get the following error message:

> UserError occured in 1st application of ModelLoader (ModelLoader)
> G Apr 6, 2010 2:26:35 PM: [Fatal] Process failed: Could not read file
> '/home/ruhe/wd/SetC_3500zip.mod': Cannot read from XML stream, wrong
> format: GC overhead limit exceeded

I already tried writing the model using all 3 possible options given in the ModelWriter. Well the model is quite large (about 1GB) but I have it running on a machine with 27GB RAM ion a small set (20000 examples, 14 attributes) so I guess this is not a memory problem.

Timbo

Find more posts tagged with

AI Studio

Accepted answers

All comments

steffen

Hello and welcome to rapidminer

Please consider this link: http://rapid-i.com/wiki/index.php?title=Memory_Issues

regards,

steffen

Timbo

Hi steffen,

thanks for your reply. Thanks you also for the link although it does not really help. The settings are such that the 27GB RAM are really used. I am pretty sure about that as building that model took about the same amount of memory. If the error message is due to a lack of heap space nontheless I might be in trouble...

Timbo

steffen

Hello

ok, it is weird that you are able to save the model but not to load it on the same machine (sorry, didnt notice this detail before, I shouldnt answer posts with a fuzzy mind

).

When I got some time to spare, I'll perform some experiments with the Weka-Random-Forest to reproduce the problem. In the meantime: Could you repeat the experiment using the RandomForest implementation of rapid-i ("Random Forest") and tell us whether it worked or not.

regards,

steffen

Timbo

Morning,

I did the same thing using the rapidminer RandomForest using the option "information_gain" as it produced the same results as the Weka RF when tested with smaller numbers of trees. Saving, reading and applying the model went fine. The only problem is that the results produced in such a way are completely weird. All examples in the set are classified as "1" with confidence(1)=1.0. The actual number of examples being "1" is only about 50%. To me this looks a lot like some kind of overtrainig. This is a bit surprising as Leo Breiman states in his paper that Random Forests can not be overtrained at all.

I'll try to perform some more tests and post the results but due to the large number of trees that might take til monday.

Timbo

Timbo

Alright, one of my tests just showed that the "overtraining" is probably due to problems with the test file.

Timbo

I am now done with my tests. Using other random forests leads to the result that I can save and load them. Nevertheless the performance is not as good as the Weka.RF for the given problem. But unfortunately I cannot read that one from file once its been saved. The strange thing is the the Weka Random Forest CAN be loaded for smaller models. But 27 GB of memory should definitely be enough. Strange and I start running out of ideas.