Hi @ all!
First of all: I'm sorry that this post has become a bit long but i would rather have it said

RapidMiner suffers from inexplainably increasing memory consumption since some years and versions (e.g.
http://rapid-i.com/rapidforum/index.php/topic,472.0.html or
http://rapid-i.com/rapidforum/index.php/topic,1911.0.html). Lately i've been running into this problem: I've been running some parameter optimizations with inner cross-validations. The memory usage increased over the hours until i got an OutOfMemoryException.
I didn't understand this behaviour as the systematicaly trying of parameter combinations should not create more and more objects. As I run from command line it was no (direct) GUI issue. I used no breakpoints. As i used some custom made operators I started debugging searching my error. I found none. But I found a memory leak of RapidMiner itself (shurely it's not the holy grale but hopefully some insight). I didn't file this as bug as it is more a design flaw (no offense ment).
The "offending" classes are really basic: SimpleAttributes and AttributeRole. To show the problem let me show you what happens when a SimpleExampleSet is created (the same happens when one is cloned):
- a new SimpleAttributes object is created
- each attribute is wrapped into a AttributeRole object
- these AttributeRole's are added to the SimpleAttributes object, while adding
- the AttributeRole is stored in a list (field of SimpleAttributes)
- the SimpleAttributes is registered as owner of the AttributeRole, while registering
- the SimpleAttributes is added to a list (field of AttributeRole)
So we have an AttributeRole referencing a SimpleAttributes object and this SimpleAttributes object referencing the same AttributeRole.
This circular reference can be brocken by
A) removing the Attribute(Role) from the SimpleAttributes

clearing all Attribute(Role)s
C) removing the ownership
A and C are never used, B only seldom [according to Eclipse->Open Call Hierarchy]. So all SimpleExampleSet's contain a reference to a SimpleAttributes object referencing itself. Now imagine this SimpleExampleSet is not referenced anymore (for example after been used inside an IteratingChain). The GarbageCollector finalizes the SimpleExampleSet but can never(!) free the SimpleAttributes as it is referenced by several AttributeRole's and never(!) free the AttributeRole's as they are referenced by the SimpleAttributes. Each time a SimpleExampleSet is cloned (almost with every iteration of any ValidationChain, ParameterOptimization) a new SimpleAttributes object and new AttributeRoles are created. Both object types accumulate in the heap until it is filled. This can be checked in Eclipse: show all instances of SimpleAttributes after some iterations.
Unfortunatelly I have no idea how to solve this problem. Both references are needed. Perhaps some AttributeOwnership object could be introduced eliminating the circular reference. But this would require some deep changes in RM...
This is now open for discussion. Maybe I've missed something.
Best regards,
chero