Memory growth ... without running a process

BAMBAMBAM
BAMBAMBAM New Altair Community Member
edited November 5 in Community Q&A
Hey everyone,

I noticed that if I just open my project in RapidMiner 4.5 and mouse over objects (to get their popup descriptions) I can increase memory usage at least 4-5Megabytes.  This seems really excessive - it's 100,000+ bytes per description! Perhaps this information could help you guys track down the continual memory growth issues (which are preventing me from advancing with my project at the moment).

This process increases memory without bounds it seems. Before running the process, it takes 450MB. After loading my data, the process takes roughly 680MB, which is actually reasonable since I am using 60k samples with 130 attributes (7.8M real or integer values) for a total of 29bytes each value.

However, once I run the process it increases in memory consumption by roughly 300MB (for a total of 980MB of consumption) after 100 iterations of the inner loop (100 usages of W-REPTree).  Restarting the process (which immediately performs a memory cleanup) does not reduce the memory footprint.  After 800 iterations the memory usage is pegged at 1.6GB (the max I've allowed) but the process fails with an out-of-memory error after roughly 1500-2000 iterations.  Here is the process (note that many operators are disabled):

<operator name="Root" class="Process" expanded="yes">
    <description text="failed w/out of memory after 1859 evalsremoved the xvalidation operator to try and allevaite this; that didn't work so I put a MemoryCleanup operator in the ParamOptimization."/>
    <parameter key="logverbosity" value="error"/>
    <operator name="MemoryCleanUp" class="MemoryCleanUp">
    </operator>
    <operator name="LoadData" class="OperatorChain" expanded="yes">
        <operator name="MacroDefinition" class="MacroDefinition">
            <list key="macros">
              <parameter key="baseName" value="test"/>
              <parameter key="longT" value="0.15"/>
              <parameter key="shortT" value="-0.15"/>
            </list>
        </operator>
        <operator name="CSVExampleSource" class="CSVExampleSource" activated="no">
            <parameter key="filename" value="daily2.csv"/>
            <parameter key="label_name" value="RRRatio"/>
            <parameter key="id_name" value="id"/>
        </operator>
        <operator name="RemoveMissing" class="AttributeFilter" activated="no">
        </operator>
        <operator name="AttributeFilter" class="AttributeFilter" activated="no">
            <parameter key="condition_class" value="attribute_name_filter"/>
            <parameter key="parameter_string" value="Symbol|d1Bar|d1Date|d0Bar|d0BarDate|d0Close|d0High|d0Low|d0HighRise|d0LowFall|d0HighRiseR|d0LowFallR|ClassRise|ClassLog|ClassRsk|rsk"/>
            <parameter key="invert_filter" value="true"/>
            <parameter key="apply_on_special" value="true"/>
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole" activated="no">
            <parameter key="name" value="Rise"/>
            <parameter key="target_role" value="ignore"/>
        </operator>
        <operator name="ExampleSetWriter" class="ExampleSetWriter" activated="no">
            <parameter key="example_set_file" value="daily2.rmd"/>
            <parameter key="attribute_description_file" value="daily2.att"/>
            <parameter key="overwrite_mode" value="overwrite"/>
        </operator>
        <operator name="ExampleSource" class="ExampleSource">
            <parameter key="attributes" value="daily2.att"/>
        </operator>
        <operator name="Sampling" class="Sampling" breakpoints="after">
            <description text="#ylt#operator name=#yquot#Root#yquot# class=#yquot#Process#yquot# expanded=#yquot#yes#yquot##ygt##ylt#operator name=#yquot#ExampleSetGenerator#yquot# class=#yquot#ExampleSetGenerator#yquot##ygt##ylt#parameter key=#yquot#target_function#yquot# value=#yquot#sum#yquot#/#ygt##ylt#/operator#ygt##ylt#operator name=#yquot#AttributeSubsetPreprocessing#yquot# class=#yquot#AttributeSubsetPreprocessing#yquot# expanded=#yquot#yes#yquot##ygt##ylt#parameter key=#yquot#attribute_name_regex#yquot# value=#yquot#att.*[^3]#yquot#/#ygt##ylt#operator name=#yquot#BinDiscretization#yquot# class=#yquot#BinDiscretization#yquot##ygt##ylt#/operator#ygt##ylt#/operator#ygt##ylt#/operator#ygt#"/>
            <parameter key="sample_ratio" value="1.0"/>
        </operator>
    </operator>
    <operator name="YAGGA" class="YAGGA" breakpoints="after" expanded="yes">
        <parameter key="population_size" value="50"/>
        <parameter key="maximum_number_of_generations" value="1000"/>
        <parameter key="generations_without_improval" value="10"/>
        <parameter key="p_initialize" value="0.03"/>
        <parameter key="use_plus" value="false"/>
        <parameter key="use_diff" value="true"/>
        <parameter key="use_div" value="true"/>
        <operator name="EvolutionaryParameterOptimization" class="EvolutionaryParameterOptimization" expanded="yes">
            <list key="parameters">
              <parameter key="DummyM.k" value="[50.0;2000.0]"/>
              <parameter key="DummyV.C" value="[5.0E-4;0.0050]"/>
              <parameter key="DummyShortT.C" value="[-0.9;0.0]"/>
              <parameter key="DummyLongT.C" value="[0.0;0.9]"/>
            </list>
            <parameter key="generations_without_improval" value="3"/>
            <parameter key="population_size" value="20"/>
            <parameter key="mutation_type" value="sparsity_mutation"/>
            <parameter key="selection_type" value="uniform"/>
            <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                <operator name="MemoryCleanUp (2)" class="MemoryCleanUp">
                </operator>
                <operator name="pred_ProfitLossFilter" class="AttributeFilter">
                    <parameter key="condition_class" value="attribute_name_filter"/>
                    <parameter key="parameter_string" value="pred|ProfitLoss"/>
                    <parameter key="invert_filter" value="true"/>
                    <parameter key="apply_on_special" value="true"/>
                </operator>
                <operator name="DummyM" class="NearestNeighbors" activated="no">
                    <parameter key="k" value="50"/>
                </operator>
                <operator name="DummyV" class="EvoSVM" activated="no">
                    <parameter key="C" value="5.0E-4"/>
                </operator>
                <operator name="ParameterCloner" class="ParameterCloner">
                    <list key="name_map">
                      <parameter key="DummyM.k" value="MainREPTree.M"/>
                      <parameter key="DummyV.C" value="MainREPTree.V"/>
                    </list>
                </operator>
                <operator name="MainREPTree" class="W-REPTree">
                    <description text="fails with -out-of memory errors after 1300 iterations"/>
                    <parameter key="keep_example_set" value="true"/>
                    <parameter key="M" value="50"/>
                    <parameter key="V" value="5.0E-4"/>
                </operator>
            </operator>
            <operator name="ApplierChain" class="OperatorChain" expanded="yes">
                <operator name="Applier" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="ChangeAttributeName" class="ChangeAttributeName">
                    <parameter key="old_name" value="prediction(RRRatio)"/>
                    <parameter key="new_name" value="pred"/>
                </operator>
                <operator name="DummyShortT" class="EvoSVM" activated="no">
                    <parameter key="C" value="-0.9"/>
                </operator>
                <operator name="DummyLongT" class="EvoSVM" activated="no">
                    <parameter key="C" value="0.6479006094796238"/>
                </operator>
                <operator name="ProfitLossConstruction" class="AttributeConstruction">
                    <list key="function_descriptions">
                      <parameter key="ProfitLoss" value="if(pred &gt; parse(param(&quot;DummyLongT&quot;, &quot;C&quot;)), Rise, if(pred&lt;parse(param(&quot;DummyShortT&quot;,&quot;C&quot;)),-Rise, 0))"/>
                    </list>
                    <parameter key="use_standard_constants" value="false"/>
                    <parameter key="keep_all" value="false"/>
                </operator>
                <operator name="Data2Performance" class="Data2Performance">
                    <parameter key="performance_type" value="statistics"/>
                    <parameter key="attribute_name" value="ProfitLoss"/>
                    <parameter key="example_index" value="1"/>
                </operator>
                <operator name="ProcessLog" class="ProcessLog" activated="no">
                    <list key="log">
                      <parameter key="tries" value="operator.MainREPTree.value.applycount"/>
                      <parameter key="ProfitLoss" value="operator.Data2Performance.value.performance"/>
                      <parameter key="LeafSize" value="operator.MainREPTree.parameter.M"/>
                      <parameter key="LeafV" value="operator.MainREPTree.parameter.V"/>
                      <parameter key="LongT" value="operator.DummyLongT.parameter.C"/>
                      <parameter key="ShortT" value="operator.DummyShortT.parameter.C"/>
                    </list>
                </operator>
            </operator>
        </operator>
    </operator>




I am using the 32-bit version of RapidMiner 4.5 on a Windows XP64 Pro machine.  Hopefully my experience will help you guys figure out what's using up all that memory (and not releasing it!)

-John
Tagged:

Answers

  • land
    land New Altair Community Member
    Hi John,
    letting your memory problems aside for a moment, I have to tell you, that your process lacks one major thing: The validation. You are making great effort in tunning the parameters of the RepTree, but you validate on your trainingssamples. This would make the 1-NearestNeighbor to the best learner ever possible with an accuracy from 100%, but it does not say anything about the real dependencies inside the data. You will have to perform a cross-validation in order to avoid fatal overfitting on your training data...

    Coming back to the memory leak problem: I cannot check that, because I don't have your data. If you could reproduce the error with an exampleSet geenerator and send this process file to me, I could reproduce it and trying to solve the issue. I guess it's the Yagga causing the problems, but I can't be sure.

    Greetings,
      Sebastian
  • BAMBAMBAM
    BAMBAMBAM New Altair Community Member
    How can I get you the data?

    I really have had no success using RapidMiner because my processes always run out of memory. Therefore the process doesn't complete and I don't get the output file I'm looking for. I've gotten to the point where the initial process starts out using 600MB of memory, with 1GB free on the machine, but eventually the process throws an out-of-memory exception.

    I could add memory to the machine or cut the number of samples down, but I have no faith that this with yield results because the memory just seems to grow without bound.  I have no idea why the memory would continue to grow for this process, since the only thing which really should be growing is the result list stored by ProcessLog, and that seems like it should be tiny (less than 1M even for thousands of results).


    Here's the process as it stands now:
    <operator name="Root" class="Process" expanded="yes">
        <description text="Spearman Rho: 0.184LinearRegression 0.359 * d1d2BarHighHigh- 0.002 * (RSI_5C * ATR_5_20)- 3.613 * d0OpenGapR+ 0.012 * RSI_5C+ 4.533 * (1/(LowestHighSince) / (RSI_5C * ATR_5_20))+ 2.998- 0.000  * (RSI_5C * RSI_5C)+ 0.000 * ((RSI_5C * ATR_5_20) pow d0OpenGapR)  "/>
        <parameter key="logverbosity" value="error"/>
        <operator name="LoadData" class="OperatorChain" expanded="yes">
            <operator name="MacroDefinition" class="MacroDefinition">
                <list key="macros">
                  <parameter key="baseName" value="Poly"/>
                </list>
            </operator>
            <operator name="ExampleSource" class="ExampleSource">
                <parameter key="attributes" value="all.att"/>
                <parameter key="local_random_seed" value="2001"/>
            </operator>
        </operator>
        <operator name="YAGGA2" class="YAGGA2" breakpoints="after" expanded="yes">
            <parameter key="local_random_seed" value="2001"/>
            <parameter key="population_size" value="200"/>
            <parameter key="maximum_number_of_generations" value="25"/>
            <parameter key="generations_without_improval" value="5"/>
            <parameter key="keep_best_individual" value="true"/>
            <parameter key="p_initialize" value="0.03"/>
            <parameter key="use_plus" value="false"/>
            <parameter key="use_diff" value="true"/>
            <parameter key="use_div" value="true"/>
            <parameter key="use_square_roots" value="true"/>
            <parameter key="use_sin" value="false"/>
            <parameter key="use_log" value="true"/>
            <parameter key="use_absolute_values" value="false"/>
            <parameter key="remove_useless" value="false"/>
            <parameter key="remove_equivalent" value="false"/>
            <parameter key="equivalence_use_statistics" value="false"/>
            <operator name="SlidingWindowValidation" class="SlidingWindowValidation" expanded="yes">
                <parameter key="keep_example_set" value="true"/>
                <parameter key="training_window_width" value="11080"/>
                <parameter key="test_window_width" value="11080"/>
                <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                    <operator name="LinearRegression" class="LinearRegression">
                        <parameter key="keep_example_set" value="true"/>
                    </operator>
                </operator>
                <operator name="ApplierChain" class="OperatorChain" expanded="yes">
                    <operator name="Applier" class="ModelApplier">
                        <parameter key="keep_model" value="true"/>
                        <list key="application_parameters">
                        </list>
                    </operator>
                    <operator name="RegressionPerformance" class="RegressionPerformance">
                        <parameter key="keep_example_set" value="true"/>
                        <parameter key="main_criterion" value="spearman_rho"/>
                        <parameter key="spearman_rho" value="true"/>
                        <parameter key="use_example_weights" value="false"/>
                    </operator>
                    <operator name="ProcessLog" class="ProcessLog">
                        <list key="log">
                          <parameter key="Perf" value="operator.RegressionPerformance.value.performance"/>
                          <parameter key="Tries" value="operator.RegressionPerformance.value.applycount"/>
                          <parameter key="len" value="operator.YAGGA2.value.average_length"/>
                          <parameter key="yperf" value="operator.YAGGA2.value.performance"/>
                        </list>
                    </operator>
                </operator>
            </operator>
        </operator>
        <operator name="AttributeWeightSelection" class="AttributeWeightSelection" breakpoints="after">
            <parameter key="weight" value="0.0"/>
            <parameter key="weight_relation" value="greater"/>
        </operator>
        <operator name="FinalModel" class="LinearRegression" breakpoints="after">
            <parameter key="keep_example_set" value="true"/>
        </operator>
        <operator name="ModelWriter" class="ModelWriter" breakpoints="after">
            <parameter key="model_file" value="LinearRegressionModel.xml"/>
            <parameter key="output_type" value="XML"/>
        </operator>
        <operator name="ModelApplier" class="ModelApplier" breakpoints="after">
            <list key="application_parameters">
            </list>
        </operator>
        <operator name="FinalPerf" class="RegressionPerformance">
            <parameter key="keep_example_set" value="true"/>
            <parameter key="main_criterion" value="spearman_rho"/>
            <parameter key="spearman_rho" value="true"/>
            <parameter key="use_example_weights" value="false"/>
        </operator>
    </operator>


    I'd appreciate any advice you might have on this subject.

    Thanks,
    John
  • land
    land New Altair Community Member
    Hi John,
    what size does your data have? How many columns and rows?

    Greetings,
      Sebastian
  • keith
    keith New Altair Community Member
    I am using the 32-bit version of RapidMiner 4.5 on a Windows XP64 Pro machine.
    Perhaps you answered this in another thread, but is there a reason you're not using the 64-bit version of RapidMiner, given that you have a 64-bit OS?  Given the memory issues you're having, that would have been one of the first things to try.  There may still be an underlying RM issue somewhere, but taking the memory limits of 32-bit apps out of consideration should help make that clearer.  Just a thought...

    Keith
  • BAMBAMBAM
    BAMBAMBAM New Altair Community Member
    Keith:
    I switched to using the 64-bit version and still have the same problems.  Non-stop memory growth. I just downloaded version 4.6 64-bit and am trying that right now.

    Sebastian:
    The data is about 300MB. Compressed I think I got it to 100MB.  Roughly 100 columns and 400,000 rows.
  • land
    land New Altair Community Member
    Hi,
    let's examine this further:
    As a first point you mentioned, that the memory consumption grows, as soon as you show a description. But this only occurs the first time, correct? Because then the complete documentation of all operators is parsed. At least I cannot reproduce any problem, when hovering over an operator. The memory consumption does not change at all.
    The second point is, that when you execute the process, it runs fine until it ends, right? If you execute it the next time, the previously consumed memory isn't freed and it crashes because of an out of memory exception. Is this correct?

    Greetings,
      Sebastian
  • land
    land New Altair Community Member
    Hi,
    I think I reproduced the problem. Let's see if we can fix it.

    Greetings,
      Sebastian
  • BAMBAMBAM
    BAMBAMBAM New Altair Community Member
    Thanks for taking a look at it, Sebastian.

    I haven't been studying the GUI-related memory growth process lately, I've just been trying to see how long I could run the process.

    With the 4.6 64-bit version I get roughly the same results as with the 4.5 64-bit version; my process dies (displaying the "the process would need mre than the available amount of memory" error) after about 90,000 iterations when its memory usage gets suspiciously close to 2GB.  I am running XP 64-bit Professional.

    I originally had the 32-bit 4.5 version installed and noticed that it's still in my Program Files (x86) directory.  I'm deleting it and restarting the test just in case some of the 32-bit libraries were being used, even though I'm quite confident that I have been running version 4.6 64-bit executable (i.e. I looked in the "about RapidMiner" tab  after starting the program).

    thanks again,
    John