Cache for ExampleSets?
harri678
New Altair Community Member
Hi,
I have been wondering if there is any chance of caching the ExampleSets between multiple runs. In my case, the loading of the sparse data files takes lots of processing time every run but the data files do not change. So some kind of caching would be great to speed things up? Has this already been discussed or is there another solution to avoid reloading sparse files every run beside sql?
Greetings,
Harald
I have been wondering if there is any chance of caching the ExampleSets between multiple runs. In my case, the loading of the sparse data files takes lots of processing time every run but the data files do not change. So some kind of caching would be great to speed things up? Has this already been discussed or is there another solution to avoid reloading sparse files every run beside sql?
Greetings,
Harald
Tagged:
0
Answers
-
Hi Harald,
did you try to save it into the repository? Might speed things up a lot...
Caching is in fact an issue, but this is not planned for the client version of RapidMiner.
Greetings,
Sebastian0 -
I made a little benchmark and the "Read AML" of a sparse file is faster than store/retrieve repository.
sparse-file-specs: 7200 examples, 155340 attributes (16Mb .dat, 11Mb .aml, approx. 90% sparse)
I use "Read AML" and "Store" to save the data into the repository and made several loading-only tests to eliminate caching. These are the results:
Retrieve Repo Read AML (sparse)
1. run: 02:10 00:18
2. run: 02:03 00:190