Process step caching
Greetings Programs!
We are currently evaluating a few tools (SAS Enetrprise Miner, IBM SPSS Modeler, RapidMiner, KNIME). This question is NOT about a comparison between those, but rather about a feature I really like in SPSS Modeler, that I haven't found in RapidMiner.
When you are creating a process, SPSS Modeler allows you to set a flag on any process step, which tells it to cache the output when run. This allows for a rapid development cycle of your process, because the tool is smart enough not to restart from the beginning of the process, but rather from a cached intermediate result.
For example: I have a CSV file with 12 million records, where I'm doing a lot of transformation and aggregation. At a certain point in the process, the intermediate result set is only 100 thousand records. I mark this spot as 'to be cached'. Next I continue developing my process, and add a few steps. Checking the result is really fast, since it can simply start with the cached set of 100k records each time I run it, and not from the starting set of 12M.
The thing I like about this feature, is that is totally transparent: I only have to mark the spot, and SPSS Modeler handles the rest.
I haven't found this in RapidMiner, which means that each time I want to check the result of my process, it has to start from scratch, running through each and every step again.
Did I overlook something? Is a similar feature available in RapidMiner?
Thanks for your input.
Tim