R-Extension: "R Script" operator changes ExampleSet values
I set up a simple process that reads a .csv file, then passes the ExampleSet to the R Script operator. The R Script operator is "empty"(for testing purposes) and does nothing except pass the ExampleSet to a Result node.
When I view the Result ExampleSet (coming out of the R Script Operator), I discover that certain missing values in my input ExampleSet have been replaced with some undesired values. This has happened with two attributes: one of the type "text", and one of the type "date". In both cases, these attribute fields have also been re-designated as "nominal" in the output ExampleSet.
So the R Script Operator has re-designated some of my attributes to "nominal" and arbitrarily replaced missing values with other data. Neither of these actions were intended by me.
Is this a (serious) bug or should I have expected this behavior? Is there a way to control how the R Script operator treats its input ExampleSet?
I am using Rapidminer Studio 6.0 Professional.
When I view the Result ExampleSet (coming out of the R Script Operator), I discover that certain missing values in my input ExampleSet have been replaced with some undesired values. This has happened with two attributes: one of the type "text", and one of the type "date". In both cases, these attribute fields have also been re-designated as "nominal" in the output ExampleSet.
So the R Script Operator has re-designated some of my attributes to "nominal" and arbitrarily replaced missing values with other data. Neither of these actions were intended by me.
Is this a (serious) bug or should I have expected this behavior? Is there a way to control how the R Script operator treats its input ExampleSet?
I am using Rapidminer Studio 6.0 Professional.
Find more posts tagged with
Sort by:
1 - 6 of
61
Thank you for your reply. I will be able to work around this problem in the manner that you suggested. However...
1) Is this behavior highlighted in any RapidMiner documentation? If I had been able to read about this early on, then I would have saved countless frustrating hours diagnosing why I was getting false results in my RapidMiner process.
2) Is there a plan in place to correct this? Should I file an official bug report using the Bugzilla application?
Thank you.
1) Is this behavior highlighted in any RapidMiner documentation? If I had been able to read about this early on, then I would have saved countless frustrating hours diagnosing why I was getting false results in my RapidMiner process.
2) Is there a plan in place to correct this? Should I file an official bug report using the Bugzilla application?
Thank you.
Hi,
1) I'm afraid not. We are in the process of improving documentation, however because that is an extension, it has not yet been improved. It is on our list though.
2) There is no way to correct this because normally the R script is used to alter the results. Because of the different internal data models used by RapidMiner and R the problem cannot be fixed in a reasonable manner. So the answer is no, this behavior will not change in the foreseeable future.
Regards,
Marco
1) I'm afraid not. We are in the process of improving documentation, however because that is an extension, it has not yet been improved. It is on our list though.
2) There is no way to correct this because normally the R script is used to alter the results. Because of the different internal data models used by RapidMiner and R the problem cannot be fixed in a reasonable manner. So the answer is no, this behavior will not change in the foreseeable future.
Regards,
Marco
Okay. It is disappointing that this isn't documented yet, and it is disappointing that there are no foreseeable plans to fix it. But it IS a bug. Should I file it with the Bug Tracker?
Also, I am new to this forum, so I don't know proper protocol with regard to marking this thread as "[SOLVED]" or not. Clearly the problem isn't solved, but I guess there is nothing more to discuss. Would you like me to keep this thread open, or mark it "solved?"
Also, I am new to this forum, so I don't know proper protocol with regard to marking this thread as "[SOLVED]" or not. Clearly the problem isn't solved, but I guess there is nothing more to discuss. Would you like me to keep this thread open, or mark it "solved?"
unfortunately this is a side effect of the way R is integrated in RapidMiner and cannot be avoided. You will need to manually fix these problems via one of the transformation operators or the Guess Value Types operator.
Regards,
Marco