Cache for ExampleSets?

harri678
harri678 New Altair Community Member
edited November 5 in Community Q&A
Hi,

I have been wondering if there is any chance of caching the ExampleSets between multiple runs. In my case, the loading of the sparse data files takes lots of processing time every run but the data files do not change. So some kind of caching would be great to speed things up? Has this already been discussed or is there another solution to avoid reloading sparse files every run beside sql?

Greetings,
Harald
Tagged:

Answers

  • land
    land New Altair Community Member
    Hi Harald,
    did you try to save it into the repository? Might speed things up a lot...
    Caching is in fact an issue, but this is not planned for the client version of RapidMiner.

    Greetings,
      Sebastian
  • harri678
    harri678 New Altair Community Member
    I made a little benchmark and the "Read AML" of a sparse file is faster than store/retrieve repository.
    sparse-file-specs: 7200 examples, 155340 attributes (16Mb .dat, 11Mb .aml, approx. 90% sparse)

    I use "Read AML" and "Store" to save the data into the repository and made several loading-only tests to eliminate caching. These are the results:

              Retrieve Repo    Read AML (sparse)
    1. run:  02:10            00:18
    2. run:  02:03            00:19