"R extension - how to get started"
Hey there,
I am really new to the use of RapidMiner and R, and well I did not find anything in the internet on how to get started with the r Extension in RapidMiner that really breaks it down to the basics. So I just tried some very simple things out like the max of a column.
the script is the following:
rm_main = function(data)
{
max($Temperature)
return(data)
}
and the error message is:
The script yould not be parsed. Please check your R script.
[1] "script.R:5:5: unexpected '$' (....)"
Do you know how to solve it?
Or do you have something were one can learn how to get started with the use of the R extension in RapidMiner with just basic knowledge?
Thanks in advance
Marie
Ah and here is the xml:
<?xml version="1.0" encoding="UTF-8"?><process version="7.2.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.2.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.2.001" expanded="true" height="68" name="Retrieve Golf" width="90" x="45" y="85">
<parameter key="repository_entry" value="//Samples/data/Golf"/>
</operator>
<operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="82" name="Execute R" width="90" x="246" y="85">
<parameter key="script" value="rm_main = function(data) { max($Temperature) } "/>
</operator>
<connect from_op="Retrieve Golf" from_port="output" to_op="Execute R" to_port="input 1"/>
<connect from_op="Execute R" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Best Answer
-
Hi Marie,
That means that the resulting R script created an object that RapidMiner can't visualize in the results tab, this is why I added the print statement where you can see the max temp in Log View. Take a look at the sample tutorial processes loaded for the Execute R operator. Just right click on the operator and click on description. There will be a link for "Jump to Tutorial Processes."
There about 4 different R examples which explain a bit on how you can embed your scripts inside RapidMiner. Good luck!
1
Answers
-
Hi,
Working with the Execute R operator is pretty straight forward once you understand how RM is delivering the data to the function. See your sample script modified.
RM is sending it's data to the Execute R script and translates it via the data.tables package. The raw data comes in as "data" via the function(data). From there I assign it to a golf <- data datafram AND then extract out the column Temperature via output <- max(golf$Temperature)
Then I return the output as an object.
I added a print statement so you can see the results in your LOG view.
<?xml version="1.0" encoding="UTF-8"?><process version="7.2.002">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.2.002" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.2.002" expanded="true" height="68" name="Retrieve Golf" width="90" x="45" y="85">
<parameter key="repository_entry" value="//Samples/data/Golf"/>
</operator>
<operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="82" name="Execute R" width="90" x="179" y="85">
<parameter key="script" value="rm_main = function(data) { golf <- data output <- max(golf$Temperature) print(str(output)) return(output) } "/>
</operator>
<connect from_op="Retrieve Golf" from_port="output" to_op="Execute R" to_port="input 1"/>
<connect from_op="Execute R" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>1 -
Hey Thomas_Ott,
thank you very much for your quick reply.
It seems very logical what you write.
But when I copield the XMl all I get in the Resluts view is:
File
Memory buffered file
What does that mean?
With kind regards
Marie
0 -
Hi Marie,
That means that the resulting R script created an object that RapidMiner can't visualize in the results tab, this is why I added the print statement where you can see the max temp in Log View. Take a look at the sample tutorial processes loaded for the Execute R operator. Just right click on the operator and click on description. There will be a link for "Jump to Tutorial Processes."
There about 4 different R examples which explain a bit on how you can embed your scripts inside RapidMiner. Good luck!
1