Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
Problems loading and applying model
Timbo
Hi
I am trying to read a model from file and then apply it to a new dataset. The Model is quite large (Weka Random Forest with 3500 trees). Whenever I try doing that I get the following error message:
> UserError occured in 1st application of ModelLoader (ModelLoader)
> G Apr 6, 2010 2:26:35 PM: [Fatal] Process failed: Could not read file
> '/home/ruhe/wd/SetC_3500zip.mod': Cannot read from XML stream, wrong
> format: GC overhead limit exceeded
I already tried writing the model using all 3 possible options given in the ModelWriter. Well the model is quite large (about 1GB) but I have it running on a machine with 27GB RAM ion a small set (20000 examples, 14 attributes) so I guess this is not a memory problem.
Timbo
Find more posts tagged with
AI Studio
Accepted answers
All comments
steffen
Hello and welcome to rapidminer
Please consider this link:
http://rapid-i.com/wiki/index.php?title=Memory_Issues
regards,
steffen
Timbo
Hi steffen,
thanks for your reply. Thanks you also for the link although it does not really help. The settings are such that the 27GB RAM are really used. I am pretty sure about that as building that model took about the same amount of memory. If the error message is due to a lack of heap space nontheless I might be in trouble...
Timbo
steffen
Hello
ok, it is weird that you are able to save the model but not to load it on the same machine (sorry, didnt notice this detail before, I shouldnt answer posts with a fuzzy mind
).
When I got some time to spare, I'll perform some experiments with the Weka-Random-Forest to reproduce the problem. In the meantime: Could you repeat the experiment using the RandomForest implementation of rapid-i ("Random Forest") and tell us whether it worked or not.
regards,
steffen
Timbo
Morning,
I did the same thing using the rapidminer RandomForest using the option "information_gain" as it produced the same results as the Weka RF when tested with smaller numbers of trees. Saving, reading and applying the model went fine. The only problem is that the results produced in such a way are completely weird. All examples in the set are classified as "1" with confidence(1)=1.0. The actual number of examples being "1" is only about 50%. To me this looks a lot like some kind of overtrainig. This is a bit surprising as Leo Breiman states in his paper that Random Forests can not be overtrained at all.
I'll try to perform some more tests and post the results but due to the large number of trees that might take til monday.
Timbo
Timbo
Alright, one of my tests just showed that the "overtraining" is probably due to problems with the test file.
Timbo
I am now done with my tests. Using other random forests leads to the result that I can save and load them. Nevertheless the performance is not as good as the Weka.RF for the given problem. But unfortunately I cannot read that one from file once its been saved. The strange thing is the the Weka Random Forest CAN be loaded for smaller models. But 27 GB of memory should definitely be enough. Strange and I start running out of ideas.
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups