"Leaking Memory Bug"

cherokee
cherokee New Altair Community Member
edited November 5 in Community Q&A
Hi @ all!

First of all: I'm sorry that this post has become a bit long but i would rather have it said ;)

RapidMiner suffers from inexplainably increasing memory consumption since some years and versions (e.g. http://rapid-i.com/rapidforum/index.php/topic,472.0.html or http://rapid-i.com/rapidforum/index.php/topic,1911.0.html). Lately i've been running into this problem: I've been running some parameter optimizations with inner cross-validations. The memory usage increased over the hours until i got an OutOfMemoryException.

I didn't understand this behaviour as the systematicaly trying of parameter combinations should not create more and more objects. As I run from command line it was no (direct) GUI issue. I used no breakpoints. As i used some custom made operators I started debugging searching my error. I found none. But I found a memory leak of RapidMiner itself (shurely it's not the holy grale but hopefully some insight). I didn't file this as bug as it is more a design flaw (no offense ment).

The "offending" classes are really basic: SimpleAttributes and AttributeRole. To show the problem let me show you what happens when a SimpleExampleSet is created (the same happens when one is cloned):

- a new SimpleAttributes object is created
- each attribute is wrapped into a AttributeRole object
- these AttributeRole's are added to the SimpleAttributes object, while adding
  - the AttributeRole is stored in a list (field of SimpleAttributes)
  - the SimpleAttributes is registered as owner of the AttributeRole, while registering
    - the SimpleAttributes is added to a list (field of AttributeRole)

So we have an AttributeRole referencing a SimpleAttributes object and this SimpleAttributes object referencing the same AttributeRole.

This circular reference can be brocken by
  A) removing the Attribute(Role) from the SimpleAttributes
  B) clearing all Attribute(Role)s
  C) removing the ownership

A and C are never used, B only seldom [according to Eclipse->Open Call Hierarchy]. So all SimpleExampleSet's contain a reference to a SimpleAttributes object referencing itself. Now imagine this SimpleExampleSet is not referenced anymore (for example after been used inside an IteratingChain). The GarbageCollector finalizes the SimpleExampleSet but can never(!) free the SimpleAttributes as it is referenced by several AttributeRole's and never(!) free the AttributeRole's as they are referenced by the SimpleAttributes. Each time a SimpleExampleSet is cloned (almost with every iteration of any ValidationChain, ParameterOptimization) a new SimpleAttributes object and new AttributeRoles are created. Both object types accumulate in the heap until it is filled. This can be checked in Eclipse: show all instances of SimpleAttributes after some iterations.

Unfortunatelly I have no idea how to solve this problem. Both references are needed. Perhaps some AttributeOwnership object could be introduced eliminating the circular reference. But this would require some deep changes in RM...

This is now open for discussion. Maybe I've missed something.

Best regards,
chero

Answers

  • fischer
    fischer New Altair Community Member
    Hi Cherokee,

    thanks for this analysis, it is highly appreciated! I really consider that a very valuable contribution of the community, and it shows the strength of open source. I put this immediately on our task board, and it won't vanish from it until it is fixed. This looks like some high priority issue. We'll keep you informed on this board.

    Cheers,
    Simon
  • cherokee
    cherokee New Altair Community Member
    Hi at all,

    well it seams i was slightly wrong. I tried to create an example process for checking this bug. I failed nearly. It seams the cloning itself is not the problem. The GC seams to be able to resolve circular references (O fcourse it must be. How should double linked lists work otherwise?). Nevertheless there are processes where the number of SimpleAttributes's and AttributeRole's are exploding.

    Here is an example process for this phenomenon:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" expanded="true" name="Process">
       <parameter key="logverbosity" value="off"/>
       <process expanded="true" height="451" width="690">
         <operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data" width="90" x="51" y="102"/>
         <operator activated="true" class="optimize_parameters_grid" expanded="true" height="94" name="Optimize Parameters (Grid)" width="90" x="247" y="86">
           <list key="parameters">
             <parameter key="k-NN.k" value="[1.0;1000;999;linear]"/>
           </list>
           <process expanded="true" height="369" width="596">
             <operator activated="true" class="x_validation" expanded="true" height="112" name="Validation" width="90" x="127" y="35">
               <process expanded="true" height="369" width="469">
                 <operator activated="true" class="replace_missing_values" expanded="true" height="94" name="Replace Missing Values" width="90" x="65" y="133">
                   <list key="columns"/>
                 </operator>
                 <operator activated="true" class="k_nn" expanded="true" height="76" name="k-NN" width="90" x="199" y="16">
                   <parameter key="k" value="1000"/>
                 </operator>
                 <operator activated="true" class="group_models" expanded="true" height="94" name="Group Models" width="90" x="327" y="111"/>
                 <connect from_port="training" to_op="Replace Missing Values" to_port="example set input"/>
                 <connect from_op="Replace Missing Values" from_port="example set output" to_op="k-NN" to_port="training set"/>
                 <connect from_op="Replace Missing Values" from_port="preprocessing model" to_op="Group Models" to_port="models in 1"/>
                 <connect from_op="k-NN" from_port="model" to_op="Group Models" to_port="models in 2"/>
                 <connect from_op="Group Models" from_port="model out" to_port="model"/>
                 <portSpacing port="source_training" spacing="0"/>
                 <portSpacing port="sink_model" spacing="60"/>
                 <portSpacing port="sink_through 1" spacing="0"/>
               </process>
               <process expanded="true" height="369" width="280">
                 <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="28" y="48">
                   <list key="application_parameters"/>
                 </operator>
                 <operator activated="true" class="performance" expanded="true" height="76" name="Performance" width="90" x="160" y="55"/>
                 <connect from_port="model" to_op="Apply Model" to_port="model"/>
                 <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                 <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
                 <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
                 <portSpacing port="source_model" spacing="0"/>
                 <portSpacing port="source_test set" spacing="0"/>
                 <portSpacing port="source_through 1" spacing="0"/>
                 <portSpacing port="sink_averagable 1" spacing="0"/>
                 <portSpacing port="sink_averagable 2" spacing="0"/>
               </process>
             </operator>
             <connect from_port="input 1" to_op="Validation" to_port="training"/>
             <connect from_op="Validation" from_port="averagable 1" to_port="performance"/>
             <portSpacing port="source_input 1" spacing="0"/>
             <portSpacing port="source_input 2" spacing="0"/>
             <portSpacing port="sink_performance" spacing="0"/>
             <portSpacing port="sink_result 1" spacing="0"/>
           </process>
         </operator>
         <connect from_op="Generate Data" from_port="output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="133"/>
       </process>
     </operator>
    </process>
    Eliminating either the ReplaceMissingValues operator or the ModelGrouper -- or both -- cancels the behavour. I have no idea why this is the case. Anyway the memory contains only one KnnRegressionModel, one ValueReplenishmentModel and one GroupedModel.
    [Edit:] --- (forget about THAT)

    Additonaly in the first fold of the first X-Validation either in learning and testing 11 additional AttributeRole's and two SimpleAttributes are created (in addition to these mentioned above). No idea why this happens.

    Best regards.
  • cherokee
    cherokee New Altair Community Member
    Heureka!

    I found it! Several things are (not) working together.

    a) Each time a PredictionModel is applied a new Prediction Attribute is created. This attribute is added to the SimpleAttributes object of the ExampleSet and to the ExampleTable.
    b) After each X-Validation step the prediction labels are removed

    So what normally happens in Validation is as follows:
    - the original ExampleSet E referencing an ExampleTable T and some SimpleAttributes object SA is cloned
    - the cloned ExampleSet E' references the same ExampleTable T as E but has its own SimpleAttributes SA'
    - this ExampleSet is splitted for learning and validation

    In testing:
    - a formally present prediction is saved
    ... (X)
    - a prediction attribute P is created
    - P is added to T
    - P is wrapped in an AttributeRole R
    - R is added as prediction to SA' (making R and P reference SA' as owner)
    ...
    - if there is a new prediction label and was an old prediction label the prediction label is removed from the ExampleSet and the ExampleTable

    - after the validation loop E' is discarded and freed by GC

    This works fine and is no problem. But: (X) is where the evaluation of the test subprocess starts. Anything can happen there. For example the used ExampleSet could be cloned (e.g. by some preprocessing operator). Then the new prediction label is added to the cloned ExampleSet. The ExampleSet the ValidationChain sees isn't changed, so the prediction attribute isn't discarded from the MemoryTable.

    So: The original ExampeSet E is still referenced (e.g. the used input port)! E in turn references T, T references all created P, each P references its SA', each SA' also references all(!) cloned AttributeRole's which in turn reference all cloned Attributes. QED  8)

    Hope you have fun with that,
    chero
  • cherokee
    cherokee New Altair Community Member
    So,

    just to prove my evaluation I'll a "bug fix":

    I think it is not possible to handle this problem inside the ValidationChain operator. Therefor a new operator eliminating the prediction label is needed, like this:
    import com.rapidminer.example.ExampleSet;
    import com.rapidminer.operator.Operator;
    import com.rapidminer.operator.OperatorDescription;
    import com.rapidminer.operator.OperatorException;
    import com.rapidminer.operator.learner.PredictionModel;
    import com.rapidminer.operator.ports.InputPort;
    import com.rapidminer.operator.ports.OutputPort;
    import com.rapidminer.operator.ports.metadata.ExampleSetMetaData;
    import com.rapidminer.operator.ports.metadata.SimplePrecondition;

    /**
    * Eliminates prediction attribute and corresponding
    * confidence attributes from an example set.
    *
    * @author Michael Siebers
    *
    */
    public class PredictionConsumer extends Operator {

    private final InputPort exampleSetInput = getInputPorts().createPort("unlabelled data");
    private final OutputPort exampleSetOutput = getOutputPorts().createPort("labelled data");

    public PredictionConsumer(OperatorDescription description) {
    super(description);

    exampleSetInput.addPrecondition(new SimplePrecondition(exampleSetInput,
    new ExampleSetMetaData()));
    }

    @Override
    public void doWork() throws OperatorException {
    ExampleSet inputExampleSet = exampleSetInput.getData();

    PredictionModel.removePredictedLabel(inputExampleSet);
    exampleSetOutput.deliver(inputExampleSet);
    }
    }
    I've tried the operator -- half of the overhead is eliminated. So my "theory" is correct but only half the truth.

    Best regards,
    chero
  • fischer
    fischer New Altair Community Member
    Hi,

    thanks for investigating this further. I hope we can come up with a solution without a custom operator soon.

    Cheers,
    Simon