Radoop Read Database - Out of Memory
Find more posts tagged with
hi @JEdward - sorry some of the Radoop folks are on vacation at the moment (and this is out of my league!). I will see if I can track somebody down for you.
Scott
Not really much exciting to see.
Please note, I changed the connection names for posting. I am using Radoop BETA here, but the same occurred in the older version too. Database is moving from an AWS RDS MySQL database into a cluster hosted outside AWS. Could this be something to do with it?
<?xml version="1.0" encoding="UTF-8"?><process version="8.2.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="radoop:radoop_nest" compatibility="9.0.000-BETA" expanded="true" height="82" name="Radoop Nest" width="90" x="179" y="34">
<parameter key="connection" value="666"/>
<parameter key="change_sample_size" value="true"/>
<parameter key="sample_size" value="100"/>
<enumeration key="tables_to_reload"/>
<process expanded="true">
<operator activated="true" class="radoop:read_db" compatibility="9.0.000-BETA" expanded="true" height="68" name="Read Database (15)" width="90" x="112" y="85">
<parameter key="connection" value="AWSRDS"/>
<parameter key="define_query" value="table name"/>
<parameter key="table_name" value="myTable"/>
<enumeration key="parameters"/>
<parameter key="temporary_table" value="false"/>
<parameter key="saved_table_name" value="trip_position3"/>
</operator>
<connect from_op="Read Database (15)" from_port="output" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
</process>
Were you able to reproduce this on your side?
I've just upgraded from RM Server 8.2 to RM Server 9.01 and can confirm it is still happening on my side.
My "workaround" of reading in snapshots of the database and then combining them all at the end is rather cumbersome.
I would expect the database to open as a stream, but it doesn't seem to want to... just wants to load it all into memory.
To assist if anyone is able, here is a sample log file.
Basic gist is:
After some gubbins Radoop begins to execute the query.
2 mins later the memory usage on Server begins to rise.
4 later the memory runs out and an OOM flag is created
4 mins later the process halts due to an OOM error.
The table size is 14GB and appears to run without any significant memory rise on my local laptop. It is only on the Server where memory usage does this. (Bit awkward when wanting to execute it overnight) -- EDIT: mistake on my side. This is also occuring on the laptop too, but is much slower as it takes longer to run the query and begin fetching all rows into memory. --
Other tables have so far been much smaller. One table was a large size (5 GB), and trigged an OOM flag on the server execution, but still managed to continue and successfully stream the data despite this. It is just this 14GB table that is causing issues.