Bug in Execute R operator
Hello, how are you, everyone.
I am using "Execute R" operator.
However, if the column name of the input table has Korean alphabet
(that is, if column name is Korean)
it crashes. (Error message shows, talking about java exeception...)
So please fix this problem for Korean users.
Thank you in advance and see you again.
KMC
Answers
-
@Rapidminerpartner can you please provide the XML and a sample data set of this process so we can reproduce the error?0
-
-
Below is rmp file...
<?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="9.3.001" expanded="true" height="68" name="Retrieve 101_DT_1B04005N_Y_2016---" width="90" x="45" y="34">
<parameter key="repository_entry" value="//Local Repository/processes/101_DT_1B04005N_Y_2016---"/>
</operator>
<operator activated="true" class="r_scripting:execute_r" compatibility="9.1.000" expanded="true" height="103" name="Execute R" width="90" x="246" y="34">
<parameter key="script" value="# rm_main is a mandatory function, # the number of arguments has to be the number of input ports (can be none) rm_main = function(data) { print('Hello, world!') # output can be found in Log View print(str(data)) # your code goes here # for example: data2 <- as.data.table(matrix(1:16,4,4)) # connect 2 output ports to see the results return(list(data,data2)) } "/>
</operator>
<connect from_op="Retrieve 101_DT_1B04005N_Y_2016---" from_port="output" to_op="Execute R" to_port="input 1"/>
<connect from_op="Execute R" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0 -
hi @Rapidminerpartner can you please also post the exampleset 101_DT_1B04005N_Y_2016--- ?0
-
Hell, I didn't know I could attach files.
Here you are... and Thank you.
0 -
Hello @Rapidminerpartner
I tried your dataset using your process and it didn't give any error for me, but it is changing the attribute names inside R-script. I tried adding breakpoint before R-Script (Execute-R operator), and it showed me the exact attribute names as present in the CSV file attached in your post. But once it is processed by the script in R it blanked some symbols with boxes as shown in the center figure below. I also see that you didn't write any script in R and just using the default script in Execute R operator. I uploaded the CSV data using read.csv in R-studio separately and observed R is changing your attribute names. This is shown in the last image in the below screenshot.
I used the data imported from CSV file to train a decision tree in rapidminer instead of R-script and see if there is any change in attribute names by Rapidminer, I see there is no change in attribute names.
So, my understanding is that the R program is changing your attribute names as it is unable to understand some special characters. I am not so sure what kind of error you are getting if you have any images of error you can attach the same.
@sgenzer might have something for this.
2 -
Hello, varunm1
Thank you for your help
I will read your detailed message this evening when I return from
my office
Also I will attach the error message window
Have a nice day, varunm1!
1 -
@varunm1 @Rapidminerpartner I have pinged our resident R expert, Dr. @yyhuang and I hope she has a moment to chime in here.
Scott1 -
Hello, varunm1 and everybody
I upload the repository data file (file extension ioo)
Please try to test my source with this attached data file
I believe all of you will see the error message
Thank you
0 -
-
Hello @Rapidminerpartner
I am able to reproduce the error with the repository file you provided. I am kind of confused seeing your data in the repository file, it all consists of some boxes. The earlier .csv file that you provided and I uploaded is fine and it doesn't even throw any errors. I am not sure why this exception is coming maybe Dr. YY can help you with this. Thanks.
Error:
1 -
Rapidminerpartner,
Thanks for sharing the data and process. I was able to re-produce the same error as yous and @varunm1 's. The issue is not about RapidMiner because it can handle korean input and parenthesis in the column name, but we can make it better in future integrations. A trick to make your R recognize Korean text, @varunm1 you can set the locale for the R environment. But the setlocale function in R will be overwritten by RapidMiner locale and encoding settings.
As you may know that R can not read data tables with special characters in the column names. So it will automatically convert (,),{,},[,] into dots. But it would fail ans show exceptions "script terminated abnormally" inside RapidMiner if we are passing a data frame with special characters in the columns from RapidMiner to R.
I did two modifications, either rename or select attributes (remove the columns with korean parenthesis) will fix the issue.<?xml version="1.0" encoding="UTF-8"?><process version="9.3.001"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="9.3.001" expanded="true" height="68" name="Retrieve 101_DT_1B04005N_Y_2016--- (2)" width="90" x="45" y="34"> <parameter key="repository_entry" value="//RM YY Loal Repository/from Community/repos_data/101_DT_1B04005N_Y_2016---"/> </operator> <operator activated="true" class="select_attributes" compatibility="9.3.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34"> <parameter key="attribute_filter_type" value="subset"/> <parameter key="attribute" value=""/> <parameter key="attributes" value="5세별|C5세별|시점"/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> </operator> <operator activated="true" class="rename_by_generic_names" compatibility="9.3.001" expanded="true" height="82" name="Rename by Generic Names" width="90" x="313" y="187"> <parameter key="attribute_filter_type" value="all"/> <parameter key="attribute" value=""/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> <parameter key="generic_name_stem" value="att"/> </operator> <operator activated="true" class="r_scripting:execute_r" compatibility="9.1.000" expanded="true" height="103" name="Execute R (3)" width="90" x="514" y="187"> <parameter key="script" value="# rm_main is a mandatory function, # the number of arguments has to be the number of input ports (can be none) rm_main = function(data) { #Sys.setlocale("LC_CTYPE", locale= "korean") print('Hello, world!') print(head(data)) # output can be found in Log View # your code goes here # for example: data2 <- as.data.table(matrix(1:16,4,4)) # connect 2 output ports to see the results return(list(data,data2)) } "/> </operator> <operator activated="true" class="r_scripting:execute_r" compatibility="9.1.000" expanded="true" height="82" name="Execute R" width="90" x="514" y="340"> <parameter key="script" value="# rm_main is a mandatory function, # the number of arguments has to be the number of input ports (can be none) rm_main = function() { 		Sys.setlocale ("LC_ALL", locale = "korean") 		df <- read.csv("C:\\Users\\YuanyuanHuang\\Desktop\\101_DT_1B04005N_Y_2016---.csv", encoding = "utf-8") 		print(tail(df)) 		return(as.data.frame(df)) } "/> </operator> <operator activated="true" class="r_scripting:execute_r" compatibility="9.1.000" expanded="true" height="103" name="Execute R (2)" width="90" x="514" y="34"> <parameter key="script" value="# rm_main is a mandatory function, # the number of arguments has to be the number of input ports (can be none) rm_main = function(data) { #Sys.setlocale("LC_CTYPE", locale= "korean") print('Hello, world!') print(head(data)) # output can be found in Log View # your code goes here # for example: data2 <- as.data.table(matrix(1:16,4,4)) # connect 2 output ports to see the results return(list(data,data2)) } "/> </operator> <connect from_op="Retrieve 101_DT_1B04005N_Y_2016--- (2)" from_port="output" to_op="Select Attributes" to_port="example set input"/> <connect from_op="Select Attributes" from_port="example set output" to_op="Execute R (2)" to_port="input 1"/> <connect from_op="Select Attributes" from_port="original" to_op="Rename by Generic Names" to_port="example set input"/> <connect from_op="Rename by Generic Names" from_port="example set output" to_op="Execute R (3)" to_port="input 1"/> <connect from_op="Execute R (3)" from_port="output 1" to_port="result 2"/> <connect from_op="Execute R" from_port="output 1" to_port="result 3"/> <connect from_op="Execute R (2)" from_port="output 1" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <portSpacing port="sink_result 3" spacing="0"/> <portSpacing port="sink_result 4" spacing="0"/> </process> </operator> </process>
6 -
Thank yyhuang and varunm1.
I will read your comment when I return from office this evening again
Have a nice day!
0 -
Thanks YY, got it.1
-
Hello, YY
Thank you for your help
I will return after checking with my source file
I thought Rapidminer doesn't support Korean column names.
I will check it as you said
Have a nice day!
1 -
Hello, YY and varunm1
I have to report that still there's problem
YY said that it will be OK if there's no special characters in the column(attribute) name
but I just checked it cause crash even in such case.
I attached "Select Attribute" to the process
so that "Select Attribute" selects just one attribute, the fifth attribute ("시점") which doesn't contain special characters
but in that case, it still crashes.
I attached the capture images. so please solve the problem for me.
Thank you and see you
0 -
-
@Rapidminerpartner
I am unable to reproduce this error, its working fine for me. @yyhuang I have a question. Why am I seeing boxes instead of korean characters? Am I missing some setting?
0 -
Hi @varunm1,
Good question. I guess you and @Rapidminerpartner are using windows OS.
@Michael also helped test the same data and the encodings under MacOS is smoother.
https://answers.microsoft.com/en-us/windows/forum/all/korean-characters-shown-as-blocks/471ca66a-c09c-4d18-85ed-7aed8afde075
If you have never installed language pack besides English, you may have issues for display of korean characters on windwos.
So I did the following on my win10
I installed language pack for Korean. I have Chinese pack installed for testing Chinese text mining long time ago
The system setting for WinOS is tricky. Hope it helps.
YY
0 -
Thanks @yyhuang, it worked well and displays characters. Once it is processed by R-operator its again throwing boxes. Is this because of the conversion between R and RapidMiner on windows? I used the exact XML provided in your earlier post. I just selected one attribute which is attribute 5. I also tried with set locale
0 -
Great, thanks. Sorry for bugging you multiple times. Have a great day.0