Why date data is missing after output from Execute R
Hi
I am trying to pass the data table to Execute R, and want to get back with extra additional attributes generated by R. But when I pass data table to Execute R and get the out put form Execute R, found out that Date attribute is missing.
1. Save the data in local repositiory with date data type.
2. Just simply multiply (output directly and the other pass to Execute R)
3. Simple do nothing Execute R script
4. Output from R script
5. Output from direct Multiply
Anyone could give me an advice, how I can get the data table as it is from Execute R Script.
Thanks.
Answers
-
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Data" width="90" x="45" y="34">
<parameter key="repository_entry" value="../data/AMZN_Historical_dt"/>
</operator>
<operator activated="true" class="multiply" compatibility="7.6.001" expanded="true" height="103" name="Multiply (3)" width="90" x="179" y="34"/>
<operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="82" name="Execute R" width="90" x="313" y="85">
<parameter key="script" value="# rm_main is a mandatory function, # the number of arguments has to be the number of input ports (can be none) rm_main = function(data) { 	return(data) } "/>
</operator>
<connect from_op="Retrieve Data" from_port="output" to_op="Multiply (3)" to_port="input"/>
<connect from_op="Multiply (3)" from_port="output 1" to_port="result 1"/>
<connect from_op="Multiply (3)" from_port="output 2" to_op="Execute R" to_port="input 1"/>
<connect from_op="Execute R" from_port="output 1" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>@sgenzer Thanks for the quick.
0 -
Try converting your date column from a RapidMiner Date type to Polynominal type.
Sometimes when converting from RM > R, the date times get wonky.
0 -
Thank you for your suggestion @Thomas_Ott.
Yep, that is one of the workable workaround, in fact I actually started with that and inside Excute R still can detact as date data type.
Do you think it is worth to report an issue to RM team?Rgds,
0 -
Hi @sgenzer,
I know the post is old but I had a similar problem.
After running a simple R script where the input example set contains a Date time attribute, I get the following error:
Exception: com.rapidminer.operator.OperatorException
Message: Script terminated abnormally.
Stack trace:
com.rapidminer.extension.rscripting.operator.scripting.AbstractScriptRunner.run(AbstractScriptRunner.java:166)
com.rapidminer.extension.rscripting.operator.scripting.AbstractScriptingLanguageOperator.doWork(AbstractScriptingLanguageOperator.java:90)
com.rapidminer.extension.rscripting.operator.scripting.r.RScriptingOperator.doWork(RScriptingOperator.java:73)
com.rapidminer.operator.Operator.execute(Operator.java:1025)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:77)
com.rapidminer.operator.ExecutionUnit$2.run(ExecutionUnit.java:812)
com.rapidminer.operator.ExecutionUnit$2.run(ExecutionUnit.java:807)
java.security.AccessController.doPrivileged(Native Method)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:807)
com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:428)
com.rapidminer.operator.Operator.execute(Operator.java:1025)
com.rapidminer.Process.execute(Process.java:1322)
com.rapidminer.Process.run(Process.java:1297)
com.rapidminer.Process.run(Process.java:1183)
com.rapidminer.Process.run(Process.java:1136)
com.rapidminer.Process.run(Process.java:1131)
com.rapidminer.Process.run(Process.java:1121)
com.rapidminer.gui.ProcessThread.run(ProcessThread.java:65)The same error occured when using Date attributes. When I convert the date attribute to nominal, the problem is solved. I'm just getting started with the "Execute R" operator and in this process I used it to simply output the ExampleSet to the RapidMiner results.
My process is as follows:
<?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.0.002" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" breakpoints="after" class="subprocess" compatibility="9.0.002" expanded="true" height="82" name="Create ExampleSet" width="90" x="45" y="34">
<process expanded="true">
<operator activated="true" class="generate_data" compatibility="9.0.002" expanded="true" height="68" name="Generate Data (2)" width="90" x="45" y="34">
<parameter key="number_examples" value="10"/>
<parameter key="number_of_attributes" value="1"/>
<parameter key="attributes_lower_bound" value="1.0"/>
</operator>
<operator activated="true" class="real_to_integer" compatibility="9.0.002" expanded="true" height="82" name="Real to Integer" width="90" x="179" y="34"/>
<operator activated="true" class="generate_attributes" compatibility="9.0.002" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="34">
<list key="function_descriptions">
<parameter key="date" value="date_add(date_now(), att1, DATE_UNIT_DAY)"/>
</list>
</operator>
<connect from_op="Generate Data (2)" from_port="output" to_op="Real to Integer" to_port="example set input"/>
<connect from_op="Real to Integer" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="false" class="date_to_nominal" compatibility="9.0.002" expanded="true" height="82" name="Date to Nominal" width="90" x="246" y="85">
<parameter key="attribute_name" value="date"/>
<parameter key="date_format" value="dd/MM/yyyy"/>
</operator>
<operator activated="true" class="r_scripting:execute_r" compatibility="8.1.000" expanded="true" height="82" name="Execute R" width="90" x="447" y="34">
<parameter key="script" value="rm_main = function(data) { print('Hello, world!') return(list(data)) } "/>
</operator>
<connect from_op="Create ExampleSet" from_port="out 1" to_op="Execute R" to_port="input 1"/>
<connect from_op="Execute R" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>The reason I'm using R is that I want to perform STL (Seasonal and Trend decomposition using Loess) on a time series and I didn't find a relevant operator in RapidMiner.
Thanks,
John
0 -
Hi @tftemme,
Great to hear that! It will be interesting to give it a try when it's released!
Regarding the problem, as @Thomas_Ott and @zeno_mas mentioned, converting date into polynominal is a solution to the problem. Even when converting date time to polynominal, R recognises the data as POSIXct which is what I wanted for analysing time series data.
However, I was wondering if the exception in my process is because I'm trying to pass data that is not supported by the Execute R operator or due to a bug.
Best regards,
John
0 -
Hi @imarkou,
Thanks for the followup.
As you said R recognises the data as POSIXct. The special classes for date and time in R are C-based. While the date class in RapidMiner is Java based.
See also about the issues when you convert dates between different systems
https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/as.Date
we suggest you use as.character() function to covert date to characters.
Page 8 on this R news gives detailed explaination about the development of date class in R.
Example process:
<?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.0.002" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="subprocess" compatibility="9.0.002" expanded="true" height="82" name="Create ExampleSet" width="90" x="45" y="34">
<process expanded="true">
<operator activated="true" class="generate_data" compatibility="9.0.002" expanded="true" height="68" name="Generate Data (2)" width="90" x="45" y="34">
<parameter key="number_examples" value="10"/>
<parameter key="number_of_attributes" value="1"/>
<parameter key="attributes_lower_bound" value="1.0"/>
</operator>
<operator activated="true" class="real_to_integer" compatibility="9.0.002" expanded="true" height="82" name="Real to Integer" width="90" x="179" y="34"/>
<operator activated="true" class="generate_attributes" compatibility="9.0.002" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="34">
<list key="function_descriptions">
<parameter key="date" value="date_add(date_now(), att1, DATE_UNIT_DAY)"/>
</list>
</operator>
<connect from_op="Generate Data (2)" from_port="output" to_op="Real to Integer" to_port="example set input"/>
<connect from_op="Real to Integer" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="date_to_nominal" compatibility="9.0.002" expanded="true" height="82" name="Date to Nominal" width="90" x="246" y="34">
<parameter key="attribute_name" value="date"/>
<parameter key="date_format" value="yyyy-MM-dd"/>
</operator>
<operator activated="true" class="r_scripting:execute_r" compatibility="8.1.000" expanded="true" height="82" name="Execute R" width="90" x="447" y="34">
<parameter key="script" value="rm_main = function(data) { 	print(data) return(list(as.data.frame(data))) } "/>
</operator>
<operator activated="true" class="r_scripting:execute_r" compatibility="8.1.000" expanded="true" height="82" name="Execute R (2)" width="90" x="380" y="187">
<parameter key="script" value="# rm_main is a mandatory function, # the number of arguments has to be the number of input ports (can be none) rm_main = function() { 	dat <- data.frame(myts = sample(10, 24, replace = T), Date = seq(as.Date("2008-09-11"), as.Date("2008-09-11") + 23, by = 1)) 	dat$Date <-as.character(dat$Date) 	return(list(dat)) } "/>
</operator>
<connect from_op="Create ExampleSet" from_port="out 1" to_op="Date to Nominal" to_port="example set input"/>
<connect from_op="Date to Nominal" from_port="example set output" to_op="Execute R" to_port="input 1"/>
<connect from_op="Execute R" from_port="output 1" to_port="result 1"/>
<connect from_op="Execute R (2)" from_port="output 1" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>YY
2