Large data set model apply
Krystian
New Altair Community Member
Hi, I try to apply model on 10mln records database. I use "read database" operator but it copies all data from database to memory in my computer so it coses out of memory exception, moreover there is timeout on database. "Stream database" looks nice but it looks like it works only to make model not to apply (I got an error when applaying with this operator). I think about building a loop to get data with parametrized SQL limit - limiting data f.eg. to 10 000 records is working very well in applying model. Please help - I think there is smarter way than making loops. Most of ETL got streaming DB read.
Thanks
Thanks
Tagged:
0
Answers
-
Hi Krystian,
using a loop is a perfect work-around if Stream Database does not work for you. As always, posting your process setup and the details of the error message could be useful.
Best, Marius0 -
I got:
Apr 11, 2012 1:19:44 PM SEVERE: Process failed: operator cannot be executed. Check the log messages...
Apr 11, 2012 1:19:44 PM SEVERE: Here: Process[1] (Process)
subprocess 'Main Process'
==> +- Stream Database[1] (Stream Database)
+- Write CSV[0] (Write CSV)
Apr 11, 2012 1:19:44 PM SEVERE: java.lang.NullPointerException
with Stream database connected only to CSV output or even to screen:
Now I'am testing to export RMiner PMML export and use it in streaming process in Pentaho. I will write how it works. Thanks
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
<process expanded="true" height="341" width="480">
<operator activated="true" class="stream_database" compatibility="5.2.003" expanded="true" height="60" name="Stream Database" width="90" x="112" y="165">
<parameter key="connection" value="External"/>
<parameter key="table_name" value="user_data_im_monthly_2012_02"/>
</operator>
<operator activated="true" class="write_csv" compatibility="5.2.003" expanded="true" height="76" name="Write CSV" width="90" x="380" y="165">
<parameter key="csv_file" value="C:\Documents and Settings\GG\My Documents\tttt"/>
</operator>
<connect from_op="Stream Database" from_port="output" to_op="Write CSV" to_port="input"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
</process>0