"Using RM to optimize R hyperparameters"

Question

Hi, I'm interested in using RapidMiner to find the optimal values for hyperparameters tuning an R model In particular, I'd like to use EvolutionaryOptimization to do so. But I've run into several issues I can't quite figure out myself. I've got a simple test case that demonstrates what I want to do. An R script builds a model using the "penalized" function R package "penalized", and takes a parameter lambda2 that controls how severe a penalty is applied. The goal in the process is to optimize the value of lambda2. I use 10-fold cross-validation to estimate the generalization with each penalty factor tried. The example works, selecting 100 as the best parameter on the list. But I can't get it to run using evolutionary parameter optimization, primarily because I can't seem to construct and pass a numeric parameter into the R script. Questions: 1) How I can I specify a numeric parameter to be used inside the R code? The grid optimization is using a list of values to set the value of a macro definition "lambda2" inside the validation. I can then use the macro inside the R code to vary the penalty. But if I try to replace the grid optimization with evolutionary optimization, I am not permitted to specify a range because the macro value could be a string rather than a numeric I couldn't see another way to pass a parameter value into R code other than the macro approach. 2) In cross-validating, the R script "Build Training Model" returns an R object, not a model, so I couldn't directly connect the port to pass to the testing side. I got around this by storing the R object in the repository, and retrieving it on the testing side. This seems awkward, but I couldn't figure out how to pass an R object around otherwise. Then in order to get RM to accept the process, I had to connect the R object to the model port on the training side, even though it complains that they aren't compatible objects. Is there a better way to do this? 3) There doesn't appear to be a way within a process to delete an object from the repository? I'm temporarily storing an R object in the repository during cross-validation, and wanted to remove them when completed, but the only two operators are Store and Retrieve. If I could solve 2) without using the repository, this concern would go away for now, although I can see the functionality being pretty important. Did I miss something obvious? 4) Because RM doesn't seem to know about applying R models, I manually constructed a performance vector from the testing label and the R-generated predictions within another R script to calculate performance. Seems to work, although RM complains about metadata being unspecified when I connect the constructed example set to the label port on the Performance operator. Not a big deal, but thought it worth mentioning in case there's a cleaner way to do this. 5) The "results.label <- column_name" trick for setting roles on a R data frame when converted back to an RM data table worked for label, but not for prediction, which is why the "Change role" operator is in the process. Note that you'll need R package "penalized" to be installed in order for this test case to work. Any suggestions would be welcomed. I want to use RM to do a lot of this kind of parameter tuning, since I find similar capabilities in R somewhat lacking. Thanks for any help. Keith