Bug in Execute R operator

Rapidminerpartner
Rapidminerpartner New Altair Community Member
edited November 5 in Community Q&A

Hello, how are you, everyone.

I am using "Execute R" operator.

However, if the column name of the input table has Korean alphabet

(that is, if column name is Korean)

it crashes. (Error message shows, talking about java exeception...)


So please fix this problem for Korean users.

Thank you in advance and see you again.


KMC


Answers

  • sgenzer
    sgenzer
    Altair Employee
    @Rapidminerpartner can you please provide the XML and a sample data set of this process so we can reproduce the error?
  • Rapidminerpartner
    Rapidminerpartner New Altair Community Member
    edited October 2019
  • Rapidminerpartner
    Rapidminerpartner New Altair Community Member

    Below is rmp file...

    <?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.3.001" expanded="true" height="68" name="Retrieve 101_DT_1B04005N_Y_2016---" width="90" x="45" y="34">
            <parameter key="repository_entry" value="//Local Repository/processes/101_DT_1B04005N_Y_2016---"/>
          </operator>
          <operator activated="true" class="r_scripting:execute_r" compatibility="9.1.000" expanded="true" height="103" name="Execute R" width="90" x="246" y="34">
            <parameter key="script" value="# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;rm_main = function(data)&#10;{&#10;    print('Hello, world!')&#10;    # output can be found in Log View&#10;    print(str(data))&#10;    &#10;    # your code goes here&#10;&#10;    # for example:&#10;    data2 &lt;- as.data.table(matrix(1:16,4,4))&#10;&#10;    # connect 2 output ports to see the results&#10;    return(list(data,data2))&#10;}&#10;"/>
          </operator>
          <connect from_op="Retrieve 101_DT_1B04005N_Y_2016---" from_port="output" to_op="Execute R" to_port="input 1"/>
          <connect from_op="Execute R" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

  • sgenzer
    sgenzer
    Altair Employee
    hi @Rapidminerpartner can you please also post the exampleset 101_DT_1B04005N_Y_2016--- ?
  • Rapidminerpartner
    Rapidminerpartner New Altair Community Member

    Hell, I didn't know I could attach files.

    Here you are... and Thank  you.


  • varunm1
    varunm1 New Altair Community Member
    edited July 2019
    Hello @Rapidminerpartner

    I tried your dataset using your process and it didn't give any error for me, but it is changing the attribute names inside R-script. I tried adding breakpoint before R-Script (Execute-R operator), and it showed me the exact attribute names as present in the CSV file attached in your post. But once it is processed by the script in R it blanked some symbols with boxes as shown in the center figure below. I also see that you didn't write any script in R and just using the default script in Execute R operator. I uploaded the CSV data using read.csv in R-studio separately and observed R is changing your attribute names. This is shown in the last image in the below screenshot.



    I used the data imported from CSV file to train a decision tree in rapidminer instead of R-script and see if there is any change in attribute names by Rapidminer, I see there is no change in attribute names.


    So, my understanding is that the R program is changing your attribute names as it is unable to understand some special characters. I am not so sure what kind of error you are getting if you have any images of error you can attach the same.

    @sgenzer might have something for this.
  • Rapidminerpartner
    Rapidminerpartner New Altair Community Member
    edited July 2019

    Hello, varunm1

    Thank you for your help

    I will read your detailed message this evening when I return from

    my office

    Also I will attach the error message window

    Have a nice day, varunm1!

  • sgenzer
    sgenzer
    Altair Employee
    @varunm1 @Rapidminerpartner I have pinged our resident R expert, Dr. @yyhuang and I hope she has a moment to chime in here.

    Scott
  • Rapidminerpartner
    Rapidminerpartner New Altair Community Member
    edited July 2019

    Hello, varunm1 and everybody

    I upload the repository data file (file extension ioo)

    Please try to test my source with this attached data file

    I believe all of  you will see the error message

    Thank you

  • Rapidminerpartner
    Rapidminerpartner New Altair Community Member

    I also upload the capture images showing error messages

    Thanks.

  • varunm1
    varunm1 New Altair Community Member
    Hello @Rapidminerpartner

    I am able to reproduce the error with the repository file you provided. I am kind of confused seeing your data in the repository file, it all consists of some boxes. The earlier .csv file that you provided and I uploaded is fine and it doesn't even throw any errors. I am not sure why this exception is coming maybe Dr. YY can help you with this. Thanks.



    Error:

  • Rapidminerpartner
    Rapidminerpartner New Altair Community Member

    Thank yyhuang and varunm1.

    I will read your comment when I return from office this evening again

    Have a nice day!

  • varunm1
    varunm1 New Altair Community Member
    Thanks YY, got it. 
  • Rapidminerpartner
    Rapidminerpartner New Altair Community Member

    Hello, YY

    Thank you for your help

    I will return after checking with my source file

    I thought Rapidminer doesn't support Korean column names.

    I will check it as you said

    Have a nice day!

  • Rapidminerpartner
    Rapidminerpartner New Altair Community Member

    Hello, YY and varunm1

    I have  to report that still there's problem

    YY said that it will be OK if there's no special characters in the column(attribute) name

    but I just checked it cause crash even in such case.

    I attached "Select Attribute" to the process

    so that "Select Attribute" selects just one attribute, the fifth attribute ("시점") which doesn't contain special characters

    but in that case, it still crashes.

    I attached the capture images. so please solve the problem for me.

    Thank you and see you

  • Rapidminerpartner
    Rapidminerpartner New Altair Community Member

    Hello, YY and varunm1

    Here is the xml, rmp files

    Please check those for me,

  • varunm1
    varunm1 New Altair Community Member
    @Rapidminerpartner

    I am unable to reproduce this error, its working fine for me. @yyhuang I have a question. Why am I seeing boxes instead of korean characters? Am I missing some setting?


  • YYH
    YYH
    Altair Employee
    Hi @varunm1,

    Good question. I guess you and @Rapidminerpartner are using windows OS. 
    @Michael also helped test the same data and the encodings under MacOS is smoother.
    https://answers.microsoft.com/en-us/windows/forum/all/korean-characters-shown-as-blocks/471ca66a-c09c-4d18-85ed-7aed8afde075
    If you have never installed language pack besides English, you may have issues for display of korean characters on windwos. 

    So I did the following on my win10


    I installed language pack for Korean. I have Chinese pack installed for testing Chinese text mining long time ago



    The system setting for WinOS is tricky. Hope it helps.

    YY 

  • varunm1
    varunm1 New Altair Community Member
    Thanks @yyhuang, it worked well and displays characters. Once it is processed by R-operator its again throwing boxes. Is this because of the conversion between R and RapidMiner on windows? I used the exact XML provided in your earlier post. I just selected one attribute which is attribute 5. I also tried with set locale


  • YYH
    YYH
    Altair Employee
    edited July 2019
    Yes, @varunm1 This is indeed a bug in the scripting integration under RapidMiner hood. We are investigating this and will keep you posted!
  • varunm1
    varunm1 New Altair Community Member
    Great, thanks. Sorry for bugging you multiple times. Have a great day.
  • YYH
    YYH
    Altair Employee
    No problems at all.
    Thank you @varunm1 for all your help testing and troubleshooting!!