I have the following problem (bug?). I want to do the following:
1. Load data with an ExcelExampleSource-Operator (the data is labeled, e.g. the first line contains the labels of the Excel-columns)
2. Apply an AttributeFilter to the loaded data by filtering certain attribute names.
The Excel input file is German, therefore there can be German Umlaute like ä, ö, ü contained in the column-labels.
In the AttributeFilter operator I set parameter "condition_class" to the value "attribute_name_filter". As a parameter string I use a regular expression containing German Umlaute like "Häuser|Bäume".
Therefore in the root operator I set the encoding to UTF-16:
<parameter key="encoding" value="UTF-16"/>
Since I work with the GUI-version of RapidMiner, I now want to switch from the XML-editor tab to the parameter editor tab. And now it happens, I receive the following error message:
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence. Cancel to ignore changes, Ok to go on editing.
As soon as I remove the Umlaute, everything works fine. It somehow seems to expect the regular expression to be UTF-8 whereas it really should be treated as UTF-16, but that's only a guess.
I can temporarily change the column labels in the input data file to not using German Umlaute, however in the long run that's no real option. Any suggestions?