"Parameter renaming for R Extension"

fabiangebert
fabiangebert New Altair Community Member
edited November 5 in Community Q&A
Hello,

I understand it is not possible for R to manage white space and some special characters in attribute names.
Therefore, I'd like to extend the GenericRLearner and GenericRModel to replace the forbidden characters prior to launching R and to revert the replacement afterwards.

I understand the R source code has not been released yet. Is there any possibility to solve this differently (not involving chaining of further operators)?

The R extension does a great job otherwise! Thanks for your effort!

Best
Fabian

Answers

  • land
    land New Altair Community Member
    Hi Fabian,
    the source code is available as zip file on the source forge project page of RapidMiner. It will be available as SVN Project, too.

    I was not aware that R has problems with that. As far as I understood it, the attribute names are just stored as Strings in a map like structure inside the dataframe, but I might be wrong there, as I'm not too familiar with R itself.
    How would you like to handle the replacement? You are going to escape the forbidden characters? Will then be a little bit unconvenient to write R code if the attribute names are completely different in a complex way. Just replacing " " by "_" would make problems on the way back...

    What about simply giving a UserError if forbidden characters are coming in? Adding a simple Rename by Replacing operator beforehand would solve the problem and is the usual way to get rid of unwanted characters in attribute names for example prior to save it in a data base.

    Greetings,
      Sebastian

    PS: If you have any other ideas, feedback, you are welcome to tell us. We want to improve the Extension further to match users workflow, so we need information about that :)

  • fabiangebert
    fabiangebert New Altair Community Member
    Hi Sebastian,

    thanks a lot for your quick reply!

    I will push back my source modifications when I am done. I think I will just provide R with a hash of the variable name instead of its actual name.

    Best
    Fabian
  • fabiangebert
    fabiangebert New Altair Community Member
    Hello,

    In line 214 in ExampleSetTranslator I added:

    //translate attribute names

    for (int j = 0; j < attributeNames.length; j++) {

    String attributeName = attributeNames;

    String hashString = getAttributeHash(attributeName);



    attributeNames = hashString;

    }



    and in line 270 and 272, I changed

    rSession.assign(expression + VARIABLE_CLASS_POSTFIX, getAttributeHash(labelAttribute.getName()));

    rSession.assign(expression + VARIABLE_WEIGHT_POSTFIX, getAttributeHash(weightAttribute.getName()));

    where getAttributeHash is a simple Base64 wrapper:

    private String getAttributeHash(String attributeName) {

    try {

    return Base64.encodeBytes(attributeName.getBytes(Charset.forName("UTF-8")), Base64.URL_SAFE);

    } catch (IOException e) {

    return attributeName;

    }

    }

    This allows for using any kind of attribute name and allows for back-translation if needed. Works great for me. I managed to create a MARS operator and two visualisation components that show the right attribute names when visualising the model.

    Best
    Fabian