An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
I'm no programmer and I couldn't help you with anything, but, come on guys! If SPSS can manage huge amounts of data and flows then you should be able to do so as well!
Ingo Mierswa wrote:By the way: On a 64 Bit system if should indeed be possible to use more memory than physically available and let the OS and Java do the temp file approach similar to the way described by you for Clementine. It's probably sufficient to adapt the amount of memory in one of our start scripts and start RapidMiner with the script. But calculations will become ridicously slow then and I would recommend to design better processes and keep control of what is happening instead of using this shutgun approach.We are able to do this. You just don't have found the right buttons yet
Ingo Mierswa wrote:Hi,don't make this a open vs. closed source discussion: this is simply not true. If Clementine has built in streaming: fine. So has RapidMiner, but it is simply not the default (for a bunch of reasons). In order to perform preprocessing (not modelling) on data sets of arbitrary sizes, you will have to use a combination of a database as data inputthe stream database operator configured to your database or use the default one and use an appropriately configured databasethe option "create view" for all preprocessing operators where possible Setting up processes making use of this streaming approach is the point where people usually have to rely on our Enterprise Support since designing such processes is no longer a trivial task. But it is definitely possible, we ourself have recently successfully transformed far more than 100 Mio. records with RapidMiner - without a significant memory footprint. This is of course mainly useful for preprocessing and more traditional BI results, there is no point in building a predictive model on a data set of this size simply due to running time restrictions.By the way: On a 64 Bit system if should indeed be possible to use more memory than physically available and let the OS and Java do the temp file approach similar to the way described by you for Clementine. It's probably sufficient to adapt the amount of memory in one of our start scripts and start RapidMiner with the script. But calculations will become ridicously slow then and I would recommend to design better processes and keep control of what is happening instead of using this shutgun approach.We are able to do this. You just don't have found the right buttons yet Cheers,Ingo
Sebastian Land wrote:Hi,the problem on this setting is, that the LinearRegression will have to copy all the data into a numerical matrix in order to invert it. This numerical matrix must be stored in main memory, and that's causes the memory problem.For large data sets I would suggest using linear scanning algorithms like Naive Bayes or the Perceptron.Greetings, Sebastian
Sebastian Land wrote:Hi,well I think that's correct, but how exactly is your criterion "supporting streaming data processing" defined?Greetings, Sebastian