Is 16GB of RAM the only way to go?
Hello everybody, I'm new here and new in general to Data Mining...
I've been reading some threads here and also been through the WEKA mailing list, and I have come to the sad conclusion that the only way to process large and complex streams of data is to have as much RAM as possible - using RapidMiner or WEKA, of course. I add this because I have been shown some pretty crazy flows on Clementine being run succesfully on modest systems with 4 or even 2 or 1! GB of RAM. I've been told that Clementine creates a lot of temporary files on the hard disk and has a somewhat optimized stream execution code. Optimized at least to allow you to put in anything you wish and let your HDD space handle it, without having to worry about RAM or heap sizes and such. Is this correct?
The first thing I tried to do with RapidMiner (4GB RAM rig) was to convert a 100 fields x 3,000,000 records SPSS file into ARFF, and it wouldn't get past the READ SPSS node. Out of memory error! The same stuff I bumped into with WEKA, and this contrasts a lot to Clementine handling quite, quite a lot more with only 1 GB of RAM.
Regarding to having to use 16GB of RAM as a rule... am I sadly right? Is it not possible to, for example, to make RapidMiner use Window's Virtual Memory? Set it to any crazy amount and let RapidMiner use it, that would be a charm. It probably isn't very efficient at all, but hey, it's definately better than directly not being able to get the job done.
On the other hand, do the enterprise versions of RapidMiner have optimized stream execution codes? If I buy the software, how would you cope with my huge data flows need?
I'm no programmer and I couldn't help you with anything, but, come on guys! If SPSS can manage huge amounts of data and flows then you should be able to do so as well! Remember they also use Java... it's not like there is a language limitation, right?
Thank your for your great program, and your kind attention.
Cheers.