Free memory operator does not work
seshadotcom
New Altair Community Member
Folks,
I finally made a logging for every db read I make and the free memory operator passes through so quickly and not even two seconds is spent in the operator when I see the timing in my logfile but the memory was rising and reaching a peak. I think it does not work correctly. I am clogged with memory issues running my experiments, need your advice for workaround . I cannot do a FP Growth for one set and then other set because ultimately -I need the combined dataset for association rule generation which is again a problem for a huge data set, I have tried it logging from CSV but it does not work
I love rapidminer tool as an wonderful idea but the memory issues God
I finally made a logging for every db read I make and the free memory operator passes through so quickly and not even two seconds is spent in the operator when I see the timing in my logfile but the memory was rising and reaching a peak. I think it does not work correctly. I am clogged with memory issues running my experiments, need your advice for workaround . I cannot do a FP Growth for one set and then other set because ultimately -I need the combined dataset for association rule generation which is again a problem for a huge data set, I have tried it logging from CSV but it does not work
I love rapidminer tool as an wonderful idea but the memory issues God
Tagged:
0
Answers
-
Can you please post your process setup?
Best regards,
Marius0 -
Hi marius,
Here is process.. I will give you a basic structure of what my plan was in this.. I make a join of two tables at a time and then I use the result for the another join and so on.. I realized that the memory was hitting a peak when the rapidminer tries to make a read operation from one the table so I use Free Memory after every block of Join I make. But the problem I see is it is not freeing up all the used memory and instead the system is frozen and bogged down eventhough you use the Free Memory.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="parallelize_main_process" value="false"/>
<process expanded="true">
<operator activated="true" class="read_database" compatibility="5.3.008" expanded="true" height="60" name="Transaction_Join1" width="90" x="45" y="30">
<parameter key="define_connection" value="predefined"/>
<parameter key="connection" value="Test"/>
<parameter key="database_system" value="MySQL"/>
<parameter key="define_query" value="query"/>
<parameter key="query" value="SELECT * FROM `transaction_mapping` where transaction_mapping.diffbwddrd>2 AND transaction_mapping.delivery_counter is not NULL limit 3000;"/>
<parameter key="use_default_schema" value="true"/>
<parameter key="prepare_statement" value="false"/>
<enumeration key="parameters"/>
<parameter key="datamanagement" value="double_array"/>
</operator>
<operator activated="true" class="read_database" compatibility="5.3.008" expanded="true" height="60" name="order_header" width="90" x="45" y="300">
<parameter key="define_connection" value="predefined"/>
<parameter key="connection" value="Test"/>
<parameter key="database_system" value="MySQL"/>
<parameter key="define_query" value="query"/>
<parameter key="query" value="select * from order_header_mapping limit 3000;"/>
<parameter key="use_default_schema" value="true"/>
<parameter key="prepare_statement" value="false"/>
<enumeration key="parameters"/>
<parameter key="datamanagement" value="double_array"/>
</operator>
<operator activated="true" class="join" compatibility="5.3.008" expanded="true" height="76" name="Join_T_OH" width="90" x="112" y="165">
<parameter key="remove_double_attributes" value="true"/>
<parameter key="join_type" value="left"/>
<parameter key="use_id_attribute_as_key" value="false"/>
<list key="key_attributes">
<parameter key="id_order_header" value="id_order_header"/>
</list>
<parameter key="keep_both_join_attributes" value="false"/>
</operator>
<operator activated="true" class="free_memory" compatibility="5.3.008" expanded="true" height="76" name="Free Memory" width="90" x="246" y="75"/>
<operator activated="true" class="read_database" compatibility="5.3.008" expanded="true" height="60" name="order_line" width="90" x="112" y="435">
<parameter key="define_connection" value="predefined"/>
<parameter key="connection" value="Test"/>
<parameter key="database_system" value="MySQL"/>
<parameter key="define_query" value="query"/>
<parameter key="query" value="SELECT * FROM order_line_mapping limit 3000;"/>
<parameter key="use_default_schema" value="true"/>
<parameter key="prepare_statement" value="false"/>
<enumeration key="parameters"/>
<parameter key="datamanagement" value="double_array"/>
</operator>
<operator activated="true" class="join" compatibility="5.3.008" expanded="true" height="76" name="Join" width="90" x="313" y="255">
<parameter key="remove_double_attributes" value="true"/>
<parameter key="join_type" value="right"/>
<parameter key="use_id_attribute_as_key" value="false"/>
<list key="key_attributes">
<parameter key="id_order_line" value="id_order_line"/>
</list>
<parameter key="keep_both_join_attributes" value="false"/>
</operator>
<operator activated="true" class="free_memory" compatibility="5.3.008" expanded="true" height="76" name="Free Memory (2)" width="90" x="514" y="75"/>
<operator activated="true" class="read_database" compatibility="5.3.008" expanded="true" height="60" name="Read State" width="90" x="112" y="570">
<parameter key="define_connection" value="predefined"/>
<parameter key="connection" value="Test"/>
<parameter key="database_system" value="MySQL"/>
<parameter key="define_query" value="query"/>
<parameter key="query" value="select * from state_mapping limit 10000;"/>
<parameter key="use_default_schema" value="true"/>
<parameter key="prepare_statement" value="false"/>
<enumeration key="parameters"/>
<parameter key="datamanagement" value="double_array"/>
</operator>
<operator activated="true" class="join" compatibility="5.3.008" expanded="true" height="76" name="Join (2)" width="90" x="313" y="525">
<parameter key="remove_double_attributes" value="true"/>
<parameter key="join_type" value="right"/>
<parameter key="use_id_attribute_as_key" value="false"/>
<list key="key_attributes">
<parameter key="id_state" value="id_state"/>
</list>
<parameter key="keep_both_join_attributes" value="false"/>
</operator>
<operator activated="true" breakpoints="after" class="write_csv" compatibility="5.3.008" expanded="true" height="76" name="Write CSV" width="90" x="514" y="525">
<parameter key="csv_file" value="M:\Work\1.csv"/>
<parameter key="column_separator" value=";"/>
<parameter key="write_attribute_names" value="true"/>
<parameter key="quote_nominal_values" value="true"/>
<parameter key="format_date_attributes" value="true"/>
<parameter key="append_to_file" value="false"/>
<parameter key="encoding" value="SYSTEM"/>
</operator>
<operator activated="true" class="set_role" compatibility="5.3.008" expanded="true" height="76" name="Set Role ID" width="90" x="581" y="390">
<parameter key="attribute_name" value="line_type"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="set_role" compatibility="5.3.008" expanded="true" height="76" name="Set Role Label" width="90" x="648" y="210">
<parameter key="attribute_name" value="id_transaction"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.3.008" expanded="true" height="76" name="Select Attributes" width="90" x="715" y="480">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attribute" value="delivery_status"/>
<parameter key="attributes" value="|counter_line|current_price4unit|current_quantity|delivery_qty|diffbwddrd|diffbwsdrd|diffbwsugqtyreqqty|diffbwtdrd|id_order_line|id_order_header|id_network|id_modifier|id_manem_doctype|price_unit|payment|priority|received_qty|id_assegnee|id_supplier|id_transaction|issued_price4unit|issued_quantity|new_suggested_qty|new_suggested_price|new_requested_qty|new_requested_price|order_number|total_delivered|transport_doc_code|special_mark|id_order_type|id_icon|i_name"/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
<operator activated="true" class="numerical_to_binominal" compatibility="5.3.008" expanded="true" height="76" name="Numerical to Binominal" width="90" x="916" y="435">
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="numeric"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="real"/>
<parameter key="block_type" value="value_series"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_series_end"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="min" value="0.0"/>
<parameter key="max" value="0.0"/>
</operator>
<operator activated="true" class="fp_growth" compatibility="5.3.008" expanded="true" height="76" name="FP-Growth" width="90" x="916" y="300">
<parameter key="find_min_number_of_itemsets" value="true"/>
<parameter key="min_number_of_itemsets" value="1000"/>
<parameter key="max_number_of_retries" value="15"/>
<parameter key="min_support" value="0.54"/>
<parameter key="max_items" value="-1"/>
<parameter key="keep_example_set" value="false"/>
</operator>
<operator activated="true" class="free_memory" compatibility="5.3.008" expanded="true" height="76" name="Free Memory (3)" width="90" x="916" y="165"/>
<operator activated="true" class="create_association_rules" compatibility="5.3.008" expanded="true" height="76" name="Create Association Rules" width="90" x="983" y="30">
<parameter key="criterion" value="laplace"/>
<parameter key="min_confidence" value="0.3"/>
<parameter key="min_criterion_value" value="0.2"/>
<parameter key="gain_theta" value="0.4"/>
<parameter key="laplace_k" value="1.0"/>
</operator>
<connect from_op="Transaction_Join1" from_port="output" to_op="Join_T_OH" to_port="left"/>
<connect from_op="order_header" from_port="output" to_op="Join_T_OH" to_port="right"/>
<connect from_op="Join_T_OH" from_port="join" to_op="Free Memory" to_port="through 1"/>
<connect from_op="Free Memory" from_port="through 1" to_op="Join" to_port="left"/>
<connect from_op="order_line" from_port="output" to_op="Join" to_port="right"/>
<connect from_op="Join" from_port="join" to_op="Free Memory (2)" to_port="through 1"/>
<connect from_op="Free Memory (2)" from_port="through 1" to_op="Join (2)" to_port="left"/>
<connect from_op="Read State" from_port="output" to_op="Join (2)" to_port="right"/>
<connect from_op="Join (2)" from_port="join" to_op="Write CSV" to_port="input"/>
<connect from_op="Write CSV" from_port="through" to_op="Set Role ID" to_port="example set input"/>
<connect from_op="Set Role ID" from_port="example set output" to_op="Set Role Label" to_port="example set input"/>
<connect from_op="Set Role Label" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Numerical to Binominal" to_port="example set input"/>
<connect from_op="Numerical to Binominal" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
<connect from_op="FP-Growth" from_port="frequent sets" to_op="Free Memory (3)" to_port="through 1"/>
<connect from_op="Free Memory (3)" from_port="through 1" to_op="Create Association Rules" to_port="item sets"/>
<connect from_op="Create Association Rules" from_port="rules" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
0 -
Hi there,
It could be that it is the association rules operator that is causing the problem, so I'd suggest putting a break after the FP-Growth operator. If it runs to there then your memory problems are probably the same as those discussed in this post http://rapid-i.com/rapidforum/index.php/topic,6837.0.html .
Best
H0 -
Hello Haddock,
Thanks for your reply. I already tried this and I get FP growth frequency item sets with true, for the attributes . it is the association rule operator which gives the memory error.
But what is the solution I could try? So it does not work with growing attributes. My requirement in future might be for 43-45 attributes.0 -
By the way Haddock/Marius if you see in my process I have used Free Memory after every block which I thought consumes the memory so even if it going to come to point where I evaluate the association rules(With CreateAssociationRules) there is a Free Memory before that so I am just trying to understand whether this block clears any memory used because the memory consumption does not reduce at all. And infact if you notice my queries I have restricted the process much more than what it can do by using the limit in my SQL queries ,. There may be a data set of close to 1mi rows if I do not use limit in which case I think it will definitely fail because it does not work for a lesser set of values.0
-
Hi there,
It's not the number of examples or attributes in the example set that matters, it's the number of attributes in the itemsets that are found by FPGrowth. So an itemset att1=1 & att200=1 & att996=1 has 3 attributes, and one that had 43 attributes would choke the association rules operator. I regularly mine datasets with 1000+ attributes and 1M+ examples, and have to resort to alternative techniques ( CUDA ) for long itemsets.
On the dark arts of the Java stack, heap, and trail I defer to others further up the pond life scale!0 -
Hello Haddock,
Thank you very much for the reply. I will wait for the reply from others.0 -
Hello folks,
I was trying to figure out this searching some previous posts if someone has experienced a similar problem with FP growth or Association rule operator and guess what I caught one other post with similar memory problem.
excerpt from the post- I have also asked the user who replied he has a workaround to know what he did.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
But in the current release there is also a bug that prevents RapidMiner from freeing some of the memory, even if in theory it would be releasable. That bug has already been fixed and will be included in the next release.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
may be I am not the only person who has this problem?0 -
Hello folks does anyone have any other ideas for resolving this issue?
I also increased the JAVA heap space today in the new system and tried but no success0 -
Sesha,
I read your post and you have mentioned that you are doing join of two tables and then using the combined data set in CreateAssociationRules operator using ReadDatabase.
I would suggest doing the join and creating a view in the database instead of RapidMiner. The ReadDatabase operator loads the data set in the main memory so it will be memory consuming. Additionally you are doing join and then generating the data set so again memory is accumulated. Try to use only one ReadDatabase operator from the table which contains your final data and then apply CreateAssociationRules. I believe the memory the consumption will be less since you are loading the huge data set only once.
You can also explore the StreamDatabase operator and see if it helps.
Regards,
Mandar0 -
Hi Mandar,
I will try this today and let you know the outcome.
Regards
Sesha0