Execute R
I'm trying to get my first Execute R script to read a wordlist from Wordlist to Data. I'm currently getting the error "object total not found". But "word" and "total" are two columns in the wordlist output from the above operator. And the R script construct is as per the structure given to me in RStudio. Any advice please?
<?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="open_file" compatibility="7.3.001" expanded="true" height="68" name="Open File" width="90" x="45" y="34">
<parameter key="filename" value="/Users/carl/Documents/government-strategy.pdf"/>
</operator>
<operator activated="true" class="text:read_document" compatibility="7.3.000" expanded="true" height="68" name="Read Document" width="90" x="179" y="34">
<parameter key="content_type" value="pdf"/>
</operator>
<operator activated="true" class="text:tokenize" compatibility="7.3.000" expanded="true" height="68" name="Tokenize" width="90" x="313" y="34"/>
<operator activated="true" class="text:filter_stopwords_english" compatibility="7.3.000" expanded="true" height="68" name="Filter Stopwords (English)" width="90" x="447" y="34"/>
<operator activated="true" class="text:transform_cases" compatibility="7.3.000" expanded="true" height="68" name="Transform Cases" width="90" x="581" y="34"/>
<operator activated="true" class="text:filter_by_length" compatibility="7.3.000" expanded="true" height="68" name="Filter Tokens (by Length)" width="90" x="715" y="34"/>
<operator activated="true" class="text:generate_n_grams_terms" compatibility="7.3.000" expanded="true" height="68" name="Generate n-Grams (Terms)" width="90" x="849" y="34"/>
<operator activated="true" class="text:process_documents" compatibility="7.3.000" expanded="true" height="103" name="Process Documents" width="90" x="983" y="34">
<parameter key="vector_creation" value="Term Occurrences"/>
<parameter key="add_meta_information" value="false"/>
<parameter key="prune_below_absolute" value="0"/>
<parameter key="prune_above_absolute" value="10"/>
<process expanded="true">
<connect from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" breakpoints="after" class="text:wordlist_to_data" compatibility="7.3.000" expanded="true" height="82" name="WordList to Data" width="90" x="1117" y="34"/>
<operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="103" name="Execute R" width="90" x="1251" y="34">
<parameter key="script" value="rm_main = function(data) { wordcloud::wordcloud(word, total, scale=c(5,0.5), max.words=100, random.order=FALSE, rot.per=0.35, use.r.layout=FALSE, colors="Dark2") }"/>
</operator>
<connect from_op="Open File" from_port="file" to_op="Read Document" to_port="file"/>
<connect from_op="Read Document" from_port="output" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
<connect from_op="Filter Stopwords (English)" from_port="document" to_op="Transform Cases" to_port="document"/>
<connect from_op="Transform Cases" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
<connect from_op="Filter Tokens (by Length)" from_port="document" to_op="Generate n-Grams (Terms)" to_port="document"/>
<connect from_op="Generate n-Grams (Terms)" from_port="document" to_op="Process Documents" to_port="documents 1"/>
<connect from_op="Process Documents" from_port="word list" to_op="WordList to Data" to_port="word list"/>
<connect from_op="WordList to Data" from_port="example set" to_op="Execute R" to_port="input 1"/>
<connect from_op="Execute R" from_port="output 1" to_port="result 1"/>
<connect from_op="Execute R" from_port="output 2" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Best Answer
-
Got there in the end with this construct!
rm_main = function(data)
{
library(base)
library(grDevices)
library(wordcloud)
library(RColorBrewer)
setwd("/Users/carl")
png(filename="mypng.png")
cloud_df <- data.frame(Words = data$word, Freq = data$total)
wordcloud::wordcloud(cloud_df$Words, cloud_df$Freq, scale=c(5,0.5), max.words=100, random.order=FALSE,
rot.per=0.35, use.r.layout=FALSE, brewer.pal(3,"Dark2"))
dev.off()
}0
Answers
-
Do you need to call a library in R? That appears to be missing in the Execute R operator.
0 -
Thank you Thomas. I've added that, but I still get errors.
The first two parameters after wordcloud should be Words, and optionally Freq. If I insert "word" which is the attribute name from Wordlist to Data, then I get the error "object 'word' not found". If I put "data" instead, then I get the error "no applicable method for 'TermDocumentMatrix' applied to an object of class \"c('data.table, 'data.frame'\)"
I'm not really sure how to point wordcloud at the right attributes ("word" has the words, and "total" has the word frequency).
rm_main = function(data)
{
library(wordcloud)
wordcloud::wordcloud(word, scale=c(5,0.5), max.words=100, random.order=FALSE,
rot.per=0.35, use.r.layout=FALSE, colors="Dark2")
}0 -
Having done some reading on R over the holidays, this should work, but it gives me a "memory buffered file" error.
Commenting out my logic, and pasting in a very simple wordcloud example that works perfectly in RStudio, I also get a "memory buffered file" error. So, is there something I need to do to output the plot correctly? Currently I'm simply connecting the first out port to a res port.
rm_main = function(data)
{
library(base)
library(wordcloud)
library(RColorBrewer)
# cloud_df <- data.frame(Words = data$word, Freq = data$total)
wordcloud::wordcloud(c(letters, LETTERS, 0:9), seq(1, 1000, len = 62))
# wordcloud::wordcloud(cloud_df$Words, cloud_df$Freq, scale=c(5,0.5), max.words=100, random.order=FALSE,
# rot.per=0.35, use.r.layout=FALSE, brewer.pal(3,"Dark2"))
}0 -
AH! You have to write the image out as a PNG or JPEG to a local directory, you can't show R plots natively inside RapidMiner, like you could in RStudio.
0 -
If I execute this code in RStudio, then it creates a PNG in my working directory. The same code in Execute R executes, but doesn't appear to create the PNG file?
wordcloud::wordcloud(c(letters, LETTERS, 0:9), seq(1, 1000, len = 62))
dev.copy(png,'myplot.png')
dev.off()0 -
Got there in the end with this construct!
rm_main = function(data)
{
library(base)
library(grDevices)
library(wordcloud)
library(RColorBrewer)
setwd("/Users/carl")
png(filename="mypng.png")
cloud_df <- data.frame(Words = data$word, Freq = data$total)
wordcloud::wordcloud(cloud_df$Words, cloud_df$Freq, scale=c(5,0.5), max.words=100, random.order=FALSE,
rot.per=0.35, use.r.layout=FALSE, brewer.pal(3,"Dark2"))
dev.off()
}0 -
Awesome, you go it! I see that you set the working directory and get your result.
0 -
Yes, thanks Thomas.
The above code was failing to produce colour, but I've fixed that too now with the code below. Works really well!
rm_main = function(data)
{
library(base)
library(grDevices)
library(wordcloud)
library(RColorBrewer)
setwd("/Users/carl")
png(filename="mypng4.png", bg="transparent")
cloud_df <- data.frame(word = data$word, freq = data$total)
wordcloud::wordcloud(cloud_df$word, cloud_df$freq, scale=c(5,0.5), max.words=50, random.order=FALSE,
rot.per=0.35, use.r.layout=FALSE, colors=brewer.pal(9,"Blues"))
dev.off()
}0