Hi everyone,
What can be suggested as a good parctical approach while working with VERY big PostgreSQL tables within Studio+Server ecosystem? Is there any way to efficiently handle 5M rows tables making aggregations and joins and not use Radoop?
For example, while preparing the data I need to calculate some metrics which involves few joins and aggregations on a few PostgreSQL tables each from 3 to 5 millions rows. I can pull a whole table and store it in Studio repository, which sometimes results in saved ExampleSets of over 500Mb size. But this usually works (though takes time), and then I can perform needed joins etc within RM process.
I want though to move all these calculations to RM server. Until I added memory to server I was usually getting while Read Database process:
java.lang.OutOfMemoryError: Java heap space
error message, but this was solved. Still, I am not able to query some bigger tables, and I get this:
javax.ejb.EJBException: javax.persistence.PersistenceException: org.hibernate.exception.GenericJDBCException: could not execute statement
The question is, if there is more efficient way to work with this data sizes, except using Radoop or making aggregations within PostgreSQL and fetching the end result only?
Thanks!