Problems loading and applying model
Timbo
New Altair Community Member
Hi
I am trying to read a model from file and then apply it to a new dataset. The Model is quite large (Weka Random Forest with 3500 trees). Whenever I try doing that I get the following error message:
> UserError occured in 1st application of ModelLoader (ModelLoader)
> G Apr 6, 2010 2:26:35 PM: [Fatal] Process failed: Could not read file
> '/home/ruhe/wd/SetC_3500zip.mod': Cannot read from XML stream, wrong
> format: GC overhead limit exceeded
I already tried writing the model using all 3 possible options given in the ModelWriter. Well the model is quite large (about 1GB) but I have it running on a machine with 27GB RAM ion a small set (20000 examples, 14 attributes) so I guess this is not a memory problem.
Timbo
I am trying to read a model from file and then apply it to a new dataset. The Model is quite large (Weka Random Forest with 3500 trees). Whenever I try doing that I get the following error message:
> UserError occured in 1st application of ModelLoader (ModelLoader)
> G Apr 6, 2010 2:26:35 PM: [Fatal] Process failed: Could not read file
> '/home/ruhe/wd/SetC_3500zip.mod': Cannot read from XML stream, wrong
> format: GC overhead limit exceeded
I already tried writing the model using all 3 possible options given in the ModelWriter. Well the model is quite large (about 1GB) but I have it running on a machine with 27GB RAM ion a small set (20000 examples, 14 attributes) so I guess this is not a memory problem.
Timbo
Tagged:
0
Answers
-
Hello and welcome to rapidminer
Please consider this link: http://rapid-i.com/wiki/index.php?title=Memory_Issues
regards,
steffen0 -
Hi steffen,
thanks for your reply. Thanks you also for the link although it does not really help. The settings are such that the 27GB RAM are really used. I am pretty sure about that as building that model took about the same amount of memory. If the error message is due to a lack of heap space nontheless I might be in trouble...
Timbo0 -
Hello
ok, it is weird that you are able to save the model but not to load it on the same machine (sorry, didnt notice this detail before, I shouldnt answer posts with a fuzzy mind ).
When I got some time to spare, I'll perform some experiments with the Weka-Random-Forest to reproduce the problem. In the meantime: Could you repeat the experiment using the RandomForest implementation of rapid-i ("Random Forest") and tell us whether it worked or not.
regards,
steffen0 -
Morning,
I did the same thing using the rapidminer RandomForest using the option "information_gain" as it produced the same results as the Weka RF when tested with smaller numbers of trees. Saving, reading and applying the model went fine. The only problem is that the results produced in such a way are completely weird. All examples in the set are classified as "1" with confidence(1)=1.0. The actual number of examples being "1" is only about 50%. To me this looks a lot like some kind of overtrainig. This is a bit surprising as Leo Breiman states in his paper that Random Forests can not be overtrained at all.
I'll try to perform some more tests and post the results but due to the large number of trees that might take til monday.
Timbo0 -
Alright, one of my tests just showed that the "overtraining" is probably due to problems with the test file.0
-
I am now done with my tests. Using other random forests leads to the result that I can save and load them. Nevertheless the performance is not as good as the Weka.RF for the given problem. But unfortunately I cannot read that one from file once its been saved. The strange thing is the the Weka Random Forest CAN be loaded for smaller models. But 27 GB of memory should definitely be enough. Strange and I start running out of ideas.0