A program to recognize and reward our most engaged community members
wessel wrote:If I start rapid miner, run any process, and leave rapid miner running I see the memory used by javaw.exe slowly growing.
Dmes wrote:To the Rapid Miner development team:There is a very serious memory leak in Version 5.1. I am reading a large (900,000 rows) csv file in. The system monitor shows memory usage slowly increasing, as expected. But when the process finishes, and a new process is started, the memory usage starts at the same level where it was when the first process ended- the 2nd process then crashes due to lack of memory!I have tested this with the Windows performance monitor as well- which confirmed that the memory was not being released when the pocess ended.
I am using the "Free Memory" operator- which seems to have no effect.
The only way to run the 2nd process is to restart Rapid Miner!
ChrisI wrote:my central memory space is 8GB and no other applications are running, the Xms parameter for Java is set at 6GB.........???
<?xml version="1.0" encoding="UTF-8" standalone="no"?><process version="5.2.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="5.2.000" expanded="true" name="Process"> <process expanded="true" height="540" width="682"> <operator activated="true" class="read_csv" compatibility="5.2.000" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30"> <parameter key="csv_file" value="C:\Users\Chris\Documents\STRATH-WEIR\CLUSTER-Event-20k.csv"/> <parameter key="column_separators" value=","/> <list key="annotations"/> <list key="data_set_meta_data_information"/> </operator> <operator activated="true" class="select_attributes" compatibility="5.2.000" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="30"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="X_Value"/> <parameter key="invert_selection" value="true"/> </operator> <operator activated="true" class="transpose" compatibility="5.2.000" expanded="true" height="76" name="Transpose" width="90" x="313" y="30"/> <operator activated="true" class="numerical_to_real" compatibility="5.2.000" expanded="true" height="76" name="Numerical to Real" width="90" x="45" y="165"/> <operator activated="true" class="loop_attributes" compatibility="5.2.000" expanded="true" height="60" name="Loop Attributes" width="90" x="179" y="165"> <process expanded="true" height="540" width="700"> <operator activated="true" class="generate_attributes" compatibility="5.2.000" expanded="true" height="76" name="Generate Attributes" width="90" x="45" y="30"> <list key="function_descriptions"> <parameter key="new-attr%{loop_attribute}" value="%{loop_attribute} * att_20001"/> </list> </operator> <connect from_port="example set" to_op="Generate Attributes" to_port="example set input"/> <connect from_op="Generate Attributes" from_port="example set output" to_port="example set"/> <portSpacing port="source_example set" spacing="0"/> <portSpacing port="sink_example set" spacing="0"/> </process> </operator> <connect from_op="Read CSV" from_port="output" to_op="Select Attributes" to_port="example set input"/> <connect from_op="Select Attributes" from_port="example set output" to_op="Transpose" to_port="example set input"/> <connect from_op="Transpose" from_port="example set output" to_op="Numerical to Real" to_port="example set input"/> <connect from_op="Numerical to Real" from_port="example set output" to_op="Loop Attributes" to_port="example set"/> <connect from_op="Loop Attributes" from_port="example set" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator></process>
<?xml version="1.0" encoding="UTF-8" standalone="no"?><process version="5.2.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="5.2.000" expanded="true" name="Process"> <process expanded="true" height="540" width="682"> <operator activated="true" class="generate_data" compatibility="5.2.000" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"> <parameter key="number_examples" value="20000"/> <parameter key="number_of_attributes" value="8"/> </operator> <operator activated="true" class="select_attributes" compatibility="5.2.000" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="30"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="label"/> <parameter key="invert_selection" value="true"/> <parameter key="include_special_attributes" value="true"/> </operator> <operator activated="true" class="transpose" compatibility="5.2.000" expanded="true" height="76" name="Transpose" width="90" x="313" y="30"/> <operator activated="true" class="select_attributes" compatibility="5.2.000" expanded="true" height="76" name="Select Attributes (2)" width="90" x="49" y="165"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="id"/> <parameter key="invert_selection" value="true"/> <parameter key="include_special_attributes" value="true"/> </operator> <operator activated="true" class="loop_attributes" compatibility="5.2.000" expanded="true" height="60" name="Loop Attributes" width="90" x="179" y="165"> <process expanded="true" height="540" width="700"> <operator activated="true" class="generate_attributes" compatibility="5.2.000" expanded="true" height="76" name="Generate Attributes" width="90" x="45" y="30"> <list key="function_descriptions"> <parameter key="new-attr%{loop_attribute}" value="%{loop_attribute} * att_20000"/> </list> </operator> <connect from_port="example set" to_op="Generate Attributes" to_port="example set input"/> <connect from_op="Generate Attributes" from_port="example set output" to_port="example set"/> <portSpacing port="source_example set" spacing="0"/> <portSpacing port="sink_example set" spacing="0"/> </process> </operator> <connect from_op="Generate Data" from_port="output" to_op="Select Attributes" to_port="example set input"/> <connect from_op="Select Attributes" from_port="example set output" to_op="Transpose" to_port="example set input"/> <connect from_op="Transpose" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/> <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Loop Attributes" to_port="example set"/> <connect from_op="Loop Attributes" from_port="example set" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator></process>
Marius wrote:I investigated this issue, and the good news is: we don't have a memleak, the memory is just not freed What I found out is the following: the JVM claims a lot of system memory quite fast, and almost never frees it. Internally however, the memory used (and not just claimed) by RapidMiner, is cleaned up between or during process runs.As test process I used the process posted above with 1000 examples. Running the same process with 20000 examples probably does not work, since with 1000 examples it already needs about 1GB of memory (this is probably improvable, and certainly will be improved in the future). At least the memory is correctly cleaned (inside the JVM) between process runs, and RapidMiner does not run out of memory, as long as the example sets are reasonably sized.Best, Marius
Marius wrote:What should that button do?
Marius wrote:Running the same process with 20000 examples probably does not work, since with 1000 examples it already needs about 1GB of memory (this is probably improvable, and certainly will be improved in the future).