number of rows in write database operator
dan_agape
New Altair Community Member
Hi,
Perhaps this has been considered, but just in case:
Having tried to write large scored example sets in a database using the write database operator, took quite long in some application I have. For instance writing 5 million rows in a MySQL database took about one hour. In addition to tuning the database itself, some optimisation on RM's side may be possible and very welcome. For instance writing an example set in a database table may be speeded up if rows/examples are written in groups (i.e. each group of examples is written via the same INSERT SQL command). If there is one row per INSERT only, time for connection, time for sending the query to the database server, and time for parsing the query are unnecessarily spent. If several rows are grouped per one INSERT command, comparable times are spent per group of rows instead of per individual row.
So it would be useful to add a parameter "number of rows" to the write database operator showing how many rows are to be grouped per INSERT command. If some DBMSs do not support several rows per INSERT, then at least this number can consist of how many individual INSERT commands (with one row per command) are to be grouped in the same SQL transaction, so anyway an optimisation would result.
Dan
Perhaps this has been considered, but just in case:
Having tried to write large scored example sets in a database using the write database operator, took quite long in some application I have. For instance writing 5 million rows in a MySQL database took about one hour. In addition to tuning the database itself, some optimisation on RM's side may be possible and very welcome. For instance writing an example set in a database table may be speeded up if rows/examples are written in groups (i.e. each group of examples is written via the same INSERT SQL command). If there is one row per INSERT only, time for connection, time for sending the query to the database server, and time for parsing the query are unnecessarily spent. If several rows are grouped per one INSERT command, comparable times are spent per group of rows instead of per individual row.
So it would be useful to add a parameter "number of rows" to the write database operator showing how many rows are to be grouped per INSERT command. If some DBMSs do not support several rows per INSERT, then at least this number can consist of how many individual INSERT commands (with one row per command) are to be grouped in the same SQL transaction, so anyway an optimisation would result.
Dan
Tagged:
0
Answers
-
Thanks for having included this feature in the Write Database operator.
For users that want to employ this, see the "batch size" parameter, whose value is the maximum number of rows to write at once in the database via the same INSERT command. The feature considerably increases the speed of writing a large/very many row dataset in a database table.
Dan0 -
I agree, thanks for including this feature. I have worked on Rapid Miner for the first time today and I found that the Batch Size option does not work for Oracle DB connection.
When this parameter is set to anything but 1, I get this error message: "Database error occurred: ORA-00933: SQL command not properly ended".
It seems the code might not add the semi-colons at the end of the lines or something.
Great work with this tool though!0 -
Hi bernardv,
I successfully tested the feature for MySQL. It would be expected for some DBMS the feature not to work for now,
but I guess the RM team will fix it for Oracle (or other very popular DBMS if needed) at some point.
Anyway this feature is a relief for the guys working with very large databases (so I guess other optimisations as suggested above may follow).
http://rapid-i.com/rapidforum/index.php?PHPSESSID=8df44e17a8757da9149552265c34f044&;topic=2818.msg14886#msg14886
Dan0