Feature Request: Loop Repository without retrieving any files
christos_karras
New Altair Community Member
The Loop Repository operator provides in its inner subprocess, a "rep" input that provides the repository entry loaded in memory. This is causing unnecessary delays for our use cases, because we have additional conditions inside the inner subprocess to decide which entries actually need to be loaded (and only a minority of them are needed). We then retrieve the entries we really need using the "Retrieve" operator and the %{repository_path} macro. The available filtering options, based on regular expressions, are not adequate for our use case because the decision is based on a lookup on another example set.
Even though our process is not using the "rep" input, RapidMiner still loads each matched repository entry in memory, which causes a process that should take a few seconds to run to instead take 30-60 minutes.
I would like to request an option to "disable automatic loading of repository entries". This could either be an explicit option (checkbox), or maybe RapidMiner could automatically detect we do not want to load entries if nothing is connected to the "rep" input.
Thanks
Tagged:
1
Best Answer
-
Hi @christos_karras ,what i could offer in relativly short term would be an operator which gives you a list of object with it's types. You can combine this with a Loop values operator where you retrieve the object using retrieve.Would that cut it?Best,Marti5
Answers
-
Hi @christos_karras ,what i could offer in relativly short term would be an operator which gives you a list of object with it's types. You can combine this with a Loop values operator where you retrieve the object using retrieve.Would that cut it?Best,Marti5
-
Hi @mschmitz ,
Yes, that's probably even better. The resulting ExampleSet would need to to have the same attributes that are provided as macros in the Loop Repository operator:
* entry_name
* repository_path
* parent_folder
I would probably use Loop Examples instead of Loop Values because I would need to access, for example, both the entry_name and repository_path in each iteration of the loop.
If it's fast to do at the same time (and if the information is available), I suggest also adding a column with the Last Modified Timestamp for each repository entry.
Thanks0