Serious Memory Leak
Dmes
New Altair Community Member
To the Rapid Miner development team:
There is a very serious memory leak in Version 5.1. I am reading a large (900,000 rows) csv file in. The system monitor shows memory usage slowly increasing, as expected. But when the process finishes, and a new process is started, the memory usage starts at the same level where it was when the first process ended- the 2nd process then crashes due to lack of memory!
I have tested this with the Windows performance monitor as well- which confirmed that the memory was not being released when the pocess ended.
I am using the "Free Memory" operator- which seems to have no effect.
The only way to run the 2nd process is to restart Rapid Miner!
Please correct this error as soon as possible!
Thanks!
There is a very serious memory leak in Version 5.1. I am reading a large (900,000 rows) csv file in. The system monitor shows memory usage slowly increasing, as expected. But when the process finishes, and a new process is started, the memory usage starts at the same level where it was when the first process ended- the 2nd process then crashes due to lack of memory!
I have tested this with the Windows performance monitor as well- which confirmed that the memory was not being released when the pocess ended.
I am using the "Free Memory" operator- which seems to have no effect.
The only way to run the 2nd process is to restart Rapid Miner!
Please correct this error as soon as possible!
Thanks!
Tagged:
0
Answers
-
I am encountering the same problem - Has any work been done on this?0
-
Hi,
I have had something similar with v. 5.1. When running LoopAttributes inside which there is a single GenerateAttribute operator, after less than 200 iterations (new attributes), it runs out of memory and 'seizes up'. The dimensionality of each example vector is 28 (reals) and the total number of example vectors is 20,000 so I cannot see that there is cause for lack of memory...... my central memory space is 8GB and no other applications are running, the Xms parameter for Java is set at 6GB.........???
ChrisI0 -
That is an interesting observation - I also am using loops that have a generate attribute statement in them. My system has 16G total of memory and 12 allocated to rapidminer. The loop is executed 5 times and the base data file is about 1G, so even if it loaded the file 5 times in a row that still shouldn't fill up the memory.
Worse, it doesn't release when the job is over even with a Free Memory box as the last step of the job. That means if I run another job immediately afterward, it will fail due to insufficient memory. I'd be happy to provide more information if someone can tell me what is needed to troubleshoot this issue.0 -
I observed a similar behavior. The memory was very fast filled. The Free memory operator "did not work".
Uwe0 -
If I start rapid miner, run any process, and leave rapid miner running I see the memory used by javaw.exe slowly growing.0
-
Hi,
I think in this thread there are described several problems.
This does probably no harm, RapidMiner just does some background calculations (e.g. updating the memory monitor ), and since it does not need the memory the garbage collection is not triggered. As soon as the memory is needed, it will be cleared.wessel wrote:
If I start rapid miner, run any process, and leave rapid miner running I see the memory used by javaw.exe slowly growing.
Just a guess: did you leave the results view open? For that, the data also stays in memory.Dmes wrote:
To the Rapid Miner development team:
There is a very serious memory leak in Version 5.1. I am reading a large (900,000 rows) csv file in. The system monitor shows memory usage slowly increasing, as expected. But when the process finishes, and a new process is started, the memory usage starts at the same level where it was when the first process ended- the 2nd process then crashes due to lack of memory!
I have tested this with the Windows performance monitor as well- which confirmed that the memory was not being released when the pocess ended.
The Free Memory operator only triggers the garbage collection explicitly, which frees data which is not needed for anything. That could speed up things later, but it does not free any memory which would not be freed automatically. Thus it won't solve any out-of-memory problems
I am using the "Free Memory" operator- which seems to have no effect.
Please try to close the result tab before running the second process. If that helps, we are done, if not, we will have a look at it.The only way to run the 2nd process is to restart Rapid Miner!
Please check that RapidMiner can really access that much memory. If not, please try the Xmx option instead of Xms.ChrisI wrote:
my central memory space is 8GB and no other applications are running, the Xms parameter for Java is set at 6GB.........???
@all: please let us know if your problems persist.
Best,
Marius
0 -
Hi,
I have checked the RapidMiner memory use via the System Monitor in the Results screen. With Xms set to 6GB it frequently clocks 5.2 GB.
Chris.0 -
Hi again,
I run a loop on an ExampleSet with 20000 vectors (Examples) each vector made up of 8 integers which I subsequently convert to reals.
The loop grinds to a halt at 218 loops at which point the memory usage is showing read at max 4.2GB. Either I am doing something stupid or there is something weird going on..... ???
How can I get the xml data and ExampleSet to you?
Kindest Regards,
ChrisI
0 -
Hi again,
Referring to my posting on the looping problem, I have managed to read Marius' instructions on posting....
Here is the xml:<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.000" expanded="true" name="Process">
<process expanded="true" height="540" width="682">
<operator activated="true" class="read_csv" compatibility="5.2.000" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
<parameter key="csv_file" value="C:\Users\Chris\Documents\STRATH-WEIR\CLUSTER-Event-20k.csv"/>
<parameter key="column_separators" value=","/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.2.000" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="30">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="X_Value"/>
<parameter key="invert_selection" value="true"/>
</operator>
<operator activated="true" class="transpose" compatibility="5.2.000" expanded="true" height="76" name="Transpose" width="90" x="313" y="30"/>
<operator activated="true" class="numerical_to_real" compatibility="5.2.000" expanded="true" height="76" name="Numerical to Real" width="90" x="45" y="165"/>
<operator activated="true" class="loop_attributes" compatibility="5.2.000" expanded="true" height="60" name="Loop Attributes" width="90" x="179" y="165">
<process expanded="true" height="540" width="700">
<operator activated="true" class="generate_attributes" compatibility="5.2.000" expanded="true" height="76" name="Generate Attributes" width="90" x="45" y="30">
<list key="function_descriptions">
<parameter key="new-attr%{loop_attribute}" value="%{loop_attribute} * att_20001"/>
</list>
</operator>
<connect from_port="example set" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_port="example set"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="sink_example set" spacing="0"/>
</process>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Transpose" to_port="example set input"/>
<connect from_op="Transpose" from_port="example set output" to_op="Numerical to Real" to_port="example set input"/>
<connect from_op="Numerical to Real" from_port="example set output" to_op="Loop Attributes" to_port="example set"/>
<connect from_op="Loop Attributes" from_port="example set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0 -
I slightly modified the process as send by ChrisI.
It now uses the generate data operator instead of read csv so anyone can paste it and run.
I kept a look at the amount of memory Rapid Miner was using:
idle memory usage 2.0GB (used by system not rapid miner)
start Rapid Miner 2.6GB
load and run process 2.8GB
press another time run 2.9GB
press 5 more times run 3.1GB
press 5 more times run 3.2GB
press 5 more times run 3.4GB
press 5 more times run 3.6GB
press 5 more times run 3.7GB
press run lots of times 6.7GB
press run lots of times 7.4GB
press run lots of times 8.2GB
http://img1.uploadscreenshot.com/images/orig/2/4106595534-orig.jpg
edit: if you wish I can try to do the same thing on Ubuntu linux and on a machine with even more memory.
Best regards,
Wessel0 -
Hi,
Tried using the GenerateData operator instead of the ReadCSV, just in case there was something confounding the issue. No change.
The machine locks up indicating 5.8GB memory used.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.000" expanded="true" name="Process">
<process expanded="true" height="540" width="682">
<operator activated="true" class="generate_data" compatibility="5.2.000" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
<parameter key="number_examples" value="20000"/>
<parameter key="number_of_attributes" value="8"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.2.000" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="30">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="label"/>
<parameter key="invert_selection" value="true"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="transpose" compatibility="5.2.000" expanded="true" height="76" name="Transpose" width="90" x="313" y="30"/>
<operator activated="true" class="select_attributes" compatibility="5.2.000" expanded="true" height="76" name="Select Attributes (2)" width="90" x="49" y="165">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="id"/>
<parameter key="invert_selection" value="true"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="loop_attributes" compatibility="5.2.000" expanded="true" height="60" name="Loop Attributes" width="90" x="179" y="165">
<process expanded="true" height="540" width="700">
<operator activated="true" class="generate_attributes" compatibility="5.2.000" expanded="true" height="76" name="Generate Attributes" width="90" x="45" y="30">
<list key="function_descriptions">
<parameter key="new-attr%{loop_attribute}" value="%{loop_attribute} * att_20000"/>
</list>
</operator>
<connect from_port="example set" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_port="example set"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="sink_example set" spacing="0"/>
</process>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Transpose" to_port="example set input"/>
<connect from_op="Transpose" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
<connect from_op="Select Attributes (2)" from_port="example set output" to_op="Loop Attributes" to_port="example set"/>
<connect from_op="Loop Attributes" from_port="example set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0 -
HI,
Using MaterializeData and FreeMemory operators inside the loop keeps the memory consuption down, BUT the execution speed is totally uancceptable
???
ChrisI0 -
I investigated this issue, and the good news is: we don't have a memleak, the memory is just not freed
What I found out is the following: the JVM claims a lot of system memory quite fast, and almost never frees it. Internally however, the memory used (and not just claimed) by RapidMiner, is cleaned up between or during process runs.
As test process I used the process posted above with 1000 examples.
Running the same process with 20000 examples probably does not work, since with 1000 examples it already needs about 1GB of memory (this is probably improvable, and certainly will be improved in the future). At least the memory is correctly cleaned (inside the JVM) between process runs, and RapidMiner does not run out of memory, as long as the example sets are reasonably sized.
Best, Marius0 -
Maybe you guys should make a button to "try and free memory".Marius wrote:
I investigated this issue, and the good news is: we don't have a memleak, the memory is just not freed
What I found out is the following: the JVM claims a lot of system memory quite fast, and almost never frees it. Internally however, the memory used (and not just claimed) by RapidMiner, is cleaned up between or during process runs.
As test process I used the process posted above with 1000 examples.
Running the same process with 20000 examples probably does not work, since with 1000 examples it already needs about 1GB of memory (this is probably improvable, and certainly will be improved in the future). At least the memory is correctly cleaned (inside the JVM) between process runs, and RapidMiner does not run out of memory, as long as the example sets are reasonably sized.
Best, Marius
Judging based on your description this should work.
What I do now, if need more memory, is simply close and restart Rapid Miner.0 -
What should that button do? As I said, at least with the test process above the memory was freed whenever it was needed during a process run. Have a look at the attached image, especially at the far right. The blue graph shows the memory actually used by RapidMiner, the orange one the memory claimed by the JVM and displayed in the windows task mananger.
0 -
OK. Marius has agreed that my trying to run 20,000 examples cannot not work, so I shall have to stop trying and find another way around the problem.
It is a pitty, but c'est la vie I guess. :-\
I have tried using the GenerateProduct operator on the same ExampleSet and it works quickly without a hitch so far.
ChrisI0 -
Preferable it should shrink the amount of memory claimed by the JVM.Marius wrote:
What should that button do?
Around 17:05 I see the amount of memory claimed by the JVM go down.
For example:
0 -
This is how the process with 10.000 features looks like in the current development version (please also have a look at the scale on the vertical axis ). We are not sure yet though if it will make it into the next release, since it needed delicate changes to the core of RapidMiner which need to be thorougly tested.Marius wrote: Running the same process with 20000 examples probably does not work, since with 1000 examples it already needs about 1GB of memory (this is probably improvable, and certainly will be improved in the future).
0 -
Hi Marius,
That looks great! It would have a huge impact on the RM looping capabilities which seems(!) to be an Achilles Heel at the moment.
I have started using GenerateProduct and with careful thinking it looks as if GenerateAggregation may also help me, otherwise I shall just output suitable files to R and get things done there.
Anyways, I hope your fix comes out soon
Kindest Regards,
ChrisI0 -
Hi RM dev crew
It seems not much progress on the RM performance front has been done since the last post here.
Please have a look at the http://rapid-i.com/rapidforum/index.php/topic,5385.0.html
thx
f0