Write Database for large dataset

elgolubeva
elgolubeva New Altair Community Member
edited November 5 in Community Q&A

Hi!

How is it possible to decrease memory usage and execution time of the Write Database operator? 

I must save in a database a dataset of about 2.2 million rows and about 25 columns. I use SQL Server 2012.

I didn't notice any difference between using batch size = 10000 and batch size = 100000: in both ways the operator took about 42 sec.

Writing my dataset first to a .txt file, than using Read CSV and than Write Database also didn't decrease the time.

Maybe Execute SQL can help?

 

Thank you for ideas.

Best Answer

  • BalazsBarany
    BalazsBarany New Altair Community Member
    Answer ✓

    Hi!

     

    Java has very special ways for managing memory. Even if Free Memory cleans up some old example sets and starts a garbage collection, the overall memory usage of the Java process will stay the same for a while. It might be lower some minutes later when Java decides to give a portion back to the operating system.

     

    Also, RapidMiner 5 was written in Java 6 times. Current Java versions probably behave differently with it.

     

    Regards,

    Balázs

Answers

  • BalazsBarany
    BalazsBarany New Altair Community Member

    Hi!

     

    The batch size depends on many factors. 10,000 could even be too large - try with lower values like 1,000 and 3,000. 

     

    Execute SQL likely won't help: you would need to create the insert strings yourself and execute line by line (or in batches) - that's what Write Database does in optimized code. 

     

    I think 42 seconds for 2.2 million rows is quite good - improbable that you can improve it dramatically. 

     

    Are you filling an existing table? Does it have many indexes and constraints defined? Could you drop the indexes beforehand, insert the data, and rebuild the indexes? This is a frequently mentioned optimization for bulk data import into databases.

    For example, if you make sure that the ID field is unique in RapidMiner, you wouldn't need the database to check it on every insert.

     

    Regards,

    Balázs

  • elgolubeva
    elgolubeva New Altair Community Member

    Balarz,

    thank you very much for your reply.

    My table has no indexes or constraints defined, I know that they increase the execution time of Write Database operator.

    Using lower values of batch size had no effect, but now I know that using too large batch size doesn't help either.

     

    I would also like to know, if it is possiple to notice the effect of Free Memory operator in RapidMiner 5.

    I look at Task Manager before and after Free Memory operator execution and see no difference in Physical Memory percentage (e.g. 96%)

     

     

     

  • BalazsBarany
    BalazsBarany New Altair Community Member
    Answer ✓

    Hi!

     

    Java has very special ways for managing memory. Even if Free Memory cleans up some old example sets and starts a garbage collection, the overall memory usage of the Java process will stay the same for a while. It might be lower some minutes later when Java decides to give a portion back to the operating system.

     

    Also, RapidMiner 5 was written in Java 6 times. Current Java versions probably behave differently with it.

     

    Regards,

    Balázs

  • elgolubeva
    elgolubeva New Altair Community Member

    Balazs,

    thank youfor your help.