🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Bug in Execute R operator

Hello, how are you, everyone.

I am using "Execute R" operator.

However, if the column name of the input table has Korean alphabet

(that is, if column name is Korean)

it crashes. (Error message shows, talking about java exeception...)


So please fix this problem for Korean users.

Thank you in advance and see you again.


KMC


Find more posts tagged with

Sort by:
1 - 23 of 231
    @Rapidminerpartner can you please provide the XML and a sample data set of this process so we can reproduce the error?

    Below is rmp file...

    <?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.3.001" expanded="true" height="68" name="Retrieve 101_DT_1B04005N_Y_2016---" width="90" x="45" y="34">
            <parameter key="repository_entry" value="//Local Repository/processes/101_DT_1B04005N_Y_2016---"/>
          </operator>
          <operator activated="true" class="r_scripting:execute_r" compatibility="9.1.000" expanded="true" height="103" name="Execute R" width="90" x="246" y="34">
            <parameter key="script" value="# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;rm_main = function(data)&#10;{&#10;    print('Hello, world!')&#10;    # output can be found in Log View&#10;    print(str(data))&#10;    &#10;    # your code goes here&#10;&#10;    # for example:&#10;    data2 &lt;- as.data.table(matrix(1:16,4,4))&#10;&#10;    # connect 2 output ports to see the results&#10;    return(list(data,data2))&#10;}&#10;"/>
          </operator>
          <connect from_op="Retrieve 101_DT_1B04005N_Y_2016---" from_port="output" to_op="Execute R" to_port="input 1"/>
          <connect from_op="Execute R" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

    hi @Rapidminerpartner can you please also post the exampleset 101_DT_1B04005N_Y_2016--- ?

    Hell, I didn't know I could attach files.

    Here you are... and Thank  you.


    varunm1User: "varunm1"
    New Altair Community Member
    Updated by varunm1
    Hello @Rapidminerpartner

    I tried your dataset using your process and it didn't give any error for me, but it is changing the attribute names inside R-script. I tried adding breakpoint before R-Script (Execute-R operator), and it showed me the exact attribute names as present in the CSV file attached in your post. But once it is processed by the script in R it blanked some symbols with boxes as shown in the center figure below. I also see that you didn't write any script in R and just using the default script in Execute R operator. I uploaded the CSV data using read.csv in R-studio separately and observed R is changing your attribute names. This is shown in the last image in the below screenshot.



    I used the data imported from CSV file to train a decision tree in rapidminer instead of R-script and see if there is any change in attribute names by Rapidminer, I see there is no change in attribute names.


    So, my understanding is that the R program is changing your attribute names as it is unable to understand some special characters. I am not so sure what kind of error you are getting if you have any images of error you can attach the same.

    @sgenzer might have something for this.

    Hello, varunm1

    Thank you for your help

    I will read your detailed message this evening when I return from

    my office

    Also I will attach the error message window

    Have a nice day, varunm1!

    @varunm1 @Rapidminerpartner I have pinged our resident R expert, Dr. @yyhuang and I hope she has a moment to chime in here.

    Scott

    Hello, varunm1 and everybody

    I upload the repository data file (file extension ioo)

    Please try to test my source with this attached data file

    I believe all of  you will see the error message

    Thank you

    I also upload the capture images showing error messages

    Thanks.

    Hello @Rapidminerpartner

    I am able to reproduce the error with the repository file you provided. I am kind of confused seeing your data in the repository file, it all consists of some boxes. The earlier .csv file that you provided and I uploaded is fine and it doesn't even throw any errors. I am not sure why this exception is coming maybe Dr. YY can help you with this. Thanks.



    Error:

    User: "YYH"
    Altair Employee
    Updated by YYH
    Rapidminerpartner,

    Thanks for sharing the data and process. I was able to re-produce the same error as yous and @varunm1 's. The issue is not about RapidMiner because it can handle korean input and parenthesis in the column name, but we can make it better in future integrations. A trick to make your R recognize Korean text, @varunm1 you can set the locale for the R environment. But the setlocale function in R will be overwritten by RapidMiner locale and encoding settings.

    As you may know that R can not read data tables with special characters in the column names. So it will automatically convert (,),{,},[,] into dots. But it would fail ans show exceptions "script terminated abnormally" inside RapidMiner if we are passing a data frame with special characters in the columns from RapidMiner to R.

    I did two modifications, either rename or select attributes (remove the columns with korean parenthesis) will fix the issue. 

    <?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.3.001" expanded="true" height="68" name="Retrieve 101_DT_1B04005N_Y_2016--- (2)" width="90" x="45" y="34">
            <parameter key="repository_entry" value="//RM YY Loal Repository/from Community/repos_data/101_DT_1B04005N_Y_2016---"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.3.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="5세별|C5세별|시점"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="rename_by_generic_names" compatibility="9.3.001" expanded="true" height="82" name="Rename by Generic Names" width="90" x="313" y="187">
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="generic_name_stem" value="att"/>
          </operator>
          <operator activated="true" class="r_scripting:execute_r" compatibility="9.1.000" expanded="true" height="103" name="Execute R (3)" width="90" x="514" y="187">
            <parameter key="script" value="# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;&#10;rm_main = function(data)&#10;{&#10;    #Sys.setlocale(&quot;LC_CTYPE&quot;, locale= &quot;korean&quot;)&#10;    &#10;    print('Hello, world!')&#10;    print(head(data))&#10;    # output can be found in Log View&#10;    &#10;    &#10;    # your code goes here&#10;&#10;    # for example:&#10;    data2 &lt;- as.data.table(matrix(1:16,4,4))&#10;&#10;    # connect 2 output ports to see the results&#10;    return(list(data,data2))&#10;}&#10;"/>
          </operator>
          <operator activated="true" class="r_scripting:execute_r" compatibility="9.1.000" expanded="true" height="82" name="Execute R" width="90" x="514" y="340">
            <parameter key="script" value="# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;&#10;rm_main = function()&#10;{&#10;    &#10;    &#10;&#9;&#9;Sys.setlocale (&quot;LC_ALL&quot;, locale = &quot;korean&quot;)&#10;&#10;&#10;&#10;&#9;&#9;df &lt;- read.csv(&quot;C:\\Users\\YuanyuanHuang\\Desktop\\101_DT_1B04005N_Y_2016---.csv&quot;, encoding = &quot;utf-8&quot;) &#10;&#10;&#9;&#9;print(tail(df))&#10;&#10;&#9;&#9;return(as.data.frame(df))&#10;}&#10;"/>
          </operator>
          <operator activated="true" class="r_scripting:execute_r" compatibility="9.1.000" expanded="true" height="103" name="Execute R (2)" width="90" x="514" y="34">
            <parameter key="script" value="# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;&#10;rm_main = function(data)&#10;{&#10;    #Sys.setlocale(&quot;LC_CTYPE&quot;, locale= &quot;korean&quot;)&#10;    &#10;    print('Hello, world!')&#10;    print(head(data))&#10;    # output can be found in Log View&#10;    &#10;    &#10;    # your code goes here&#10;&#10;    # for example:&#10;    data2 &lt;- as.data.table(matrix(1:16,4,4))&#10;&#10;    # connect 2 output ports to see the results&#10;    return(list(data,data2))&#10;}&#10;"/>
          </operator>
          <connect from_op="Retrieve 101_DT_1B04005N_Y_2016--- (2)" from_port="output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Execute R (2)" to_port="input 1"/>
          <connect from_op="Select Attributes" from_port="original" to_op="Rename by Generic Names" to_port="example set input"/>
          <connect from_op="Rename by Generic Names" from_port="example set output" to_op="Execute R (3)" to_port="input 1"/>
          <connect from_op="Execute R (3)" from_port="output 1" to_port="result 2"/>
          <connect from_op="Execute R" from_port="output 1" to_port="result 3"/>
          <connect from_op="Execute R (2)" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
        </process>
      </operator>
    </process>
    


    Thank yyhuang and varunm1.

    I will read your comment when I return from office this evening again

    Have a nice day!

    Thanks YY, got it. 

    Hello, YY

    Thank you for your help

    I will return after checking with my source file

    I thought Rapidminer doesn't support Korean column names.

    I will check it as you said

    Have a nice day!

    Hello, YY and varunm1

    I have  to report that still there's problem

    YY said that it will be OK if there's no special characters in the column(attribute) name

    but I just checked it cause crash even in such case.

    I attached "Select Attribute" to the process

    so that "Select Attribute" selects just one attribute, the fifth attribute ("시점") which doesn't contain special characters

    but in that case, it still crashes.

    I attached the capture images. so please solve the problem for me.

    Thank you and see you

    Hello, YY and varunm1

    Here is the xml, rmp files

    Please check those for me,

    @Rapidminerpartner

    I am unable to reproduce this error, its working fine for me. @yyhuang I have a question. Why am I seeing boxes instead of korean characters? Am I missing some setting?


    User: "YYH"
    Altair Employee
    Hi @varunm1,

    Good question. I guess you and @Rapidminerpartner are using windows OS. 
    @Michael also helped test the same data and the encodings under MacOS is smoother.
    https://answers.microsoft.com/en-us/windows/forum/all/korean-characters-shown-as-blocks/471ca66a-c09c-4d18-85ed-7aed8afde075
    If you have never installed language pack besides English, you may have issues for display of korean characters on windwos. 

    So I did the following on my win10


    I installed language pack for Korean. I have Chinese pack installed for testing Chinese text mining long time ago



    The system setting for WinOS is tricky. Hope it helps.

    YY 

    Thanks @yyhuang, it worked well and displays characters. Once it is processed by R-operator its again throwing boxes. Is this because of the conversion between R and RapidMiner on windows? I used the exact XML provided in your earlier post. I just selected one attribute which is attribute 5. I also tried with set locale


    User: "YYH"
    Altair Employee
    Updated by YYH
    Yes, @varunm1 This is indeed a bug in the scripting integration under RapidMiner hood. We are investigating this and will keep you posted!
    Great, thanks. Sorry for bugging you multiple times. Have a great day.
    User: "YYH"
    Altair Employee
    No problems at all.
    Thank you @varunm1 for all your help testing and troubleshooting!!