Why date data is missing after output from Execute R

zeno_mas
zeno_mas New Altair Community Member
edited November 5 in Community Q&A

Hi 

I am trying to pass the data table to Execute R,  and want to get back with extra additional attributes generated by R. But when I pass data table to Execute R and get the out put form Execute R, found out that Date attribute is missing.

1. Save the data in local repositiory with date data type.

2. Just simply multiply (output directly and the other pass to Execute R)

3. Simple do nothing Execute R script

4. Output from R script

5. Output from direct Multiply

Anyone could give me an advice, how I can get the data table as it is from Execute R Script.

 

Thanks.
Rapidminer_ExecuteR.png

Tagged:

Answers

  • sgenzer
    sgenzer
    Altair Employee

    hello @zeno_mas - could you please post your process so we can take a look at it?  Please use the </> tool above.

     

    Thanks.

    Scott

  • zeno_mas
    zeno_mas New Altair Community Member

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Data" width="90" x="45" y="34">
    <parameter key="repository_entry" value="../data/AMZN_Historical_dt"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="7.6.001" expanded="true" height="103" name="Multiply (3)" width="90" x="179" y="34"/>
    <operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="82" name="Execute R" width="90" x="313" y="85">
    <parameter key="script" value="# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;rm_main = function(data)&#10;{&#10;&#9;return(data)&#10;}&#10;"/>
    </operator>
    <connect from_op="Retrieve Data" from_port="output" to_op="Multiply (3)" to_port="input"/>
    <connect from_op="Multiply (3)" from_port="output 1" to_port="result 1"/>
    <connect from_op="Multiply (3)" from_port="output 2" to_op="Execute R" to_port="input 1"/>
    <connect from_op="Execute R" from_port="output 1" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

    @sgenzer Thanks for the quick.

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    Try converting your date column from a RapidMiner Date type to Polynominal type. 

     

    Sometimes when converting from RM > R, the date times get wonky. 

  • zeno_mas
    zeno_mas New Altair Community Member

    Thank you for your suggestion @Thomas_Ott.

    Yep, that is one of the workable workaround, in fact I actually started with that and inside Excute R still can detact as date data type.

    Do you think it is worth to report an issue to RM team?

     

    Rgds,

  • sgenzer
    sgenzer
    Altair Employee

    hi @zeno_mas - just curious.  What are you trying to do in R that cannot be done with RapidMiner operators?

     

    Scott

  • imarkou
    imarkou New Altair Community Member

    Hi @sgenzer,

     

    I know the post is old but I had a similar problem.

    After running a simple R script where the input example set contains a Date time attribute, I get the following error:

    Exception: com.rapidminer.operator.OperatorException
    Message: Script terminated abnormally.
    Stack trace:

    com.rapidminer.extension.rscripting.operator.scripting.AbstractScriptRunner.run(AbstractScriptRunner.java:166)
    com.rapidminer.extension.rscripting.operator.scripting.AbstractScriptingLanguageOperator.doWork(AbstractScriptingLanguageOperator.java:90)
    com.rapidminer.extension.rscripting.operator.scripting.r.RScriptingOperator.doWork(RScriptingOperator.java:73)
    com.rapidminer.operator.Operator.execute(Operator.java:1025)
    com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:77)
    com.rapidminer.operator.ExecutionUnit$2.run(ExecutionUnit.java:812)
    com.rapidminer.operator.ExecutionUnit$2.run(ExecutionUnit.java:807)
    java.security.AccessController.doPrivileged(Native Method)
    com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:807)
    com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:428)
    com.rapidminer.operator.Operator.execute(Operator.java:1025)
    com.rapidminer.Process.execute(Process.java:1322)
    com.rapidminer.Process.run(Process.java:1297)
    com.rapidminer.Process.run(Process.java:1183)
    com.rapidminer.Process.run(Process.java:1136)
    com.rapidminer.Process.run(Process.java:1131)
    com.rapidminer.Process.run(Process.java:1121)
    com.rapidminer.gui.ProcessThread.run(ProcessThread.java:65)

    The same error occured when using Date attributes. When I convert the date attribute to nominal, the problem is solved. I'm just getting started with the "Execute R" operator and in this process I used it to simply output the ExampleSet to the RapidMiner results.

    My process is as follows:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.0.002" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" breakpoints="after" class="subprocess" compatibility="9.0.002" expanded="true" height="82" name="Create ExampleSet" width="90" x="45" y="34">
    <process expanded="true">
    <operator activated="true" class="generate_data" compatibility="9.0.002" expanded="true" height="68" name="Generate Data (2)" width="90" x="45" y="34">
    <parameter key="number_examples" value="10"/>
    <parameter key="number_of_attributes" value="1"/>
    <parameter key="attributes_lower_bound" value="1.0"/>
    </operator>
    <operator activated="true" class="real_to_integer" compatibility="9.0.002" expanded="true" height="82" name="Real to Integer" width="90" x="179" y="34"/>
    <operator activated="true" class="generate_attributes" compatibility="9.0.002" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="34">
    <list key="function_descriptions">
    <parameter key="date" value="date_add(date_now(), att1, DATE_UNIT_DAY)"/>
    </list>
    </operator>
    <connect from_op="Generate Data (2)" from_port="output" to_op="Real to Integer" to_port="example set input"/>
    <connect from_op="Real to Integer" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="false" class="date_to_nominal" compatibility="9.0.002" expanded="true" height="82" name="Date to Nominal" width="90" x="246" y="85">
    <parameter key="attribute_name" value="date"/>
    <parameter key="date_format" value="dd/MM/yyyy"/>
    </operator>
    <operator activated="true" class="r_scripting:execute_r" compatibility="8.1.000" expanded="true" height="82" name="Execute R" width="90" x="447" y="34">
    <parameter key="script" value="rm_main = function(data)&#10;{&#10; print('Hello, world!')&#10; return(list(data))&#10;}&#10;"/>
    </operator>
    <connect from_op="Create ExampleSet" from_port="out 1" to_op="Execute R" to_port="input 1"/>
    <connect from_op="Execute R" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    The reason I'm using R is that I want to perform STL (Seasonal and Trend decomposition using Loess) on a time series and I didn't find a relevant operator in RapidMiner.

     

    Thanks,

    John

  • tftemme
    tftemme New Altair Community Member

    Hi @imarkou,

     

    Just a small teaser concerning the STL Decomposition. With the next release of RapidMiner Studio we will add an operator capable of performing STL.

     

    Best regards,
    Fabian

  • imarkou
    imarkou New Altair Community Member

    Hi @tftemme,

     

    Great to hear that! It will be interesting to give it a try when it's released!

     

    Regarding the problem, as @Thomas_Ott and @zeno_mas mentioned, converting date into polynominal is a solution to the problem. Even when converting date time to polynominal, R recognises the data as POSIXct which is what I wanted for analysing time series data.

     

    However, I was wondering if the exception in my process is because I'm trying to pass data that is not supported by the Execute R operator or due to a bug.

     

    Best regards,

    John

  • sgenzer
    sgenzer
    Altair Employee

    cc'ing our resident R expert @yyhuang :)

     

     

  • YYH
    YYH
    Altair Employee

    Hi @imarkou,

     

    Thanks for the followup.

     

    As you said R recognises the data as POSIXct. The special classes for date and time in R are C-based. While the date class in RapidMiner is Java based.

    See also about the issues when you convert dates between different systems

    https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/as.Date

    we suggest you use as.character() function to covert date to characters.

    Page 8 on this R news gives detailed explaination about the development of date class in R.

    Example process:

    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.0.002" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="subprocess" compatibility="9.0.002" expanded="true" height="82" name="Create ExampleSet" width="90" x="45" y="34">
    <process expanded="true">
    <operator activated="true" class="generate_data" compatibility="9.0.002" expanded="true" height="68" name="Generate Data (2)" width="90" x="45" y="34">
    <parameter key="number_examples" value="10"/>
    <parameter key="number_of_attributes" value="1"/>
    <parameter key="attributes_lower_bound" value="1.0"/>
    </operator>
    <operator activated="true" class="real_to_integer" compatibility="9.0.002" expanded="true" height="82" name="Real to Integer" width="90" x="179" y="34"/>
    <operator activated="true" class="generate_attributes" compatibility="9.0.002" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="34">
    <list key="function_descriptions">
    <parameter key="date" value="date_add(date_now(), att1, DATE_UNIT_DAY)"/>
    </list>
    </operator>
    <connect from_op="Generate Data (2)" from_port="output" to_op="Real to Integer" to_port="example set input"/>
    <connect from_op="Real to Integer" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="date_to_nominal" compatibility="9.0.002" expanded="true" height="82" name="Date to Nominal" width="90" x="246" y="34">
    <parameter key="attribute_name" value="date"/>
    <parameter key="date_format" value="yyyy-MM-dd"/>
    </operator>
    <operator activated="true" class="r_scripting:execute_r" compatibility="8.1.000" expanded="true" height="82" name="Execute R" width="90" x="447" y="34">
    <parameter key="script" value="rm_main = function(data)&#10;{&#10;&#9;print(data)&#10; return(list(as.data.frame(data)))&#10;}&#10;"/>
    </operator>
    <operator activated="true" class="r_scripting:execute_r" compatibility="8.1.000" expanded="true" height="82" name="Execute R (2)" width="90" x="380" y="187">
    <parameter key="script" value="# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;rm_main = function()&#10;{&#10; &#9;dat &lt;- data.frame(myts = sample(10, 24, replace = T), Date = seq(as.Date(&quot;2008-09-11&quot;), as.Date(&quot;2008-09-11&quot;) + 23, by = 1))&#10; &#9;dat$Date &lt;-as.character(dat$Date)&#10; &#9;return(list(dat))&#10;}&#10;"/>
    </operator>
    <connect from_op="Create ExampleSet" from_port="out 1" to_op="Date to Nominal" to_port="example set input"/>
    <connect from_op="Date to Nominal" from_port="example set output" to_op="Execute R" to_port="input 1"/>
    <connect from_op="Execute R" from_port="output 1" to_port="result 1"/>
    <connect from_op="Execute R (2)" from_port="output 1" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

     

    YY

  • imarkou
    imarkou New Altair Community Member

    Thanks a lot for the detailed explanation @yyhuang!