"CSV File too big to process?"

Hello altogether,
I have got a problem concerning the input data. I want to retrieve a 2GB CSV file, but everytime the operator stops at 40%, then the error message, that memory is not enough occurs (I have 16GB RAM). What can I do about that? Since Rapidminer is a data mining software I expected it to to things like that easily?
Thank you for your help :-)
Answers
-
My first suggestion would be to try splitting your csv into smaller chunks (using any free utility like csv splitter) and then reading them in using a loop files operator and joining/appending them together. I don't know how RapidMiner handles memory management for large files like that. Perhaps one of the RM staffers will have another suggestion.
0 -
Hi,
Can you share a data sample so we can investigate?
Without not knowing anything about the data, I would have two suggestions to try:
- Make sure that the attribute types are properly set in the import wizard. If you store a datetime as a nominal instead of a proper datetime then you grow the memory footprint significantly. Same with attributes that have only missing values in the first X rows. RapidMiner will not be able to guess their types so unless you set that manually, it will default to nominal.
- You may want to try the in-product beta mode that has a lower memory footprint in general. See more details here: http://static.rapidminer.com/rnd/html/rapidminer-7.3-beta-mode.html
Best,
0 -
The only idea I came up with (after searching the web) was import the CSV into an sql program (e.g. postgres), so that I can use the stream data operator?
Unfortunately I cannot upload the data but I can tell you everything. It contains 8 attributes and round about 80 million examples. The only attribute I had to change with the import wizard was the date (it was set wrong). Ironically if I don't change the date-type the import succeeds.0 -
Sometimes dates are not read in correctly in the Read CSV operator, but that's OK. You can always convert those date values by using Nominal to Date operator.
0