Cannot load facttable (Oracle)

Vinnie
Vinnie New Altair Community Member
edited November 5 in Community Q&A
Hello

I have RapidMiner Studio connected to a Oracle datawarehouse. Now RapidMiner can see the tables and I can open all tables in the repository except my facttable (wich has over 9 million records). When I click it nothing happens.

When I look in the rapidminer-studio log I get the following:
Mar 31, 2015 8:51:49 AM com.rapidminer.tools.jdbc.DatabaseHandler executeStatement
INFO: Executing query: 'SELECT * FROM "DWH"."FACTTABLE"'

with no error following...

Has anyone been able to read such big amounts of records?

Thanks.
Tagged:

Answers

  • Marco_Boeck
    Marco_Boeck New Altair Community Member
    Hi,

    browsing such big tables in the repository view is not recommended. You should use the "Read Database" operator and potentially specify some filters in the WHERE clause. Loading 9 million rows from your database is possible, but obviously the data has to go somewhere and that will take quite a lot of memory and time to load. Probably using the "Loop" operator and some manual paging should be used.

    Basic demonstration process:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.4.000-SNAPSHOT">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="6.4.000-SNAPSHOT" expanded="true" name="Process">
       <process expanded="true">
         <operator activated="true" class="set_macros" compatibility="6.4.000-SNAPSHOT" expanded="true" height="76" name="Set Macros" width="90" x="45" y="30">
           <list key="macros">
             <parameter key="step_size" value="10000"/>
           </list>
         </operator>
         <operator activated="true" class="loop" compatibility="6.4.000-SNAPSHOT" expanded="true" height="76" name="Loop" width="90" x="179" y="30">
           <parameter key="set_iteration_macro" value="true"/>
           <parameter key="macro_name" value="i"/>
           <parameter key="iterations" value="5"/>
           <process expanded="true">
             <operator activated="true" class="generate_macro" compatibility="6.4.000-SNAPSHOT" expanded="true" height="76" name="Generate Macro" width="90" x="45" y="30">
               <list key="function_descriptions">
                 <parameter key="macro_start" value="%{step_size} * (%{i}-1)"/>
                 <parameter key="macro_end" value="%{step_size} * %{i}"/>
               </list>
             </operator>
             <operator activated="true" class="read_database" compatibility="6.4.000-SNAPSHOT" expanded="true" height="60" name="Read Database" width="90" x="179" y="30">
               <parameter key="connection" value="Local"/>
               <parameter key="query" value="SELECT *&#10;FROM `big`&#10;WHERE id &gt; %{macro_start} AND id &lt; %{macro_end}"/>
               <enumeration key="parameters"/>
             </operator>
             <connect from_port="input 1" to_op="Generate Macro" to_port="through 1"/>
             <connect from_op="Read Database" from_port="output" to_port="output 1"/>
             <portSpacing port="source_input 1" spacing="0"/>
             <portSpacing port="source_input 2" spacing="0"/>
             <portSpacing port="sink_output 1" spacing="0"/>
             <portSpacing port="sink_output 2" spacing="0"/>
           </process>
         </operator>
         <connect from_op="Set Macros" from_port="through 1" to_op="Loop" to_port="input 1"/>
         <connect from_op="Loop" from_port="output 1" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>
    Regards,
    Marco