"RegularExpression-bug?"

choose_username
choose_username New Altair Community Member
edited November 5 in Community Q&A
Hi there,

i have a Data set which got lines looking like the following:

41, Private, 109912, Bachelors, 13, Never-married, Other-service, Not-in-family, White, Female, 0, 0, 40, ?, <=50K


From time to time there is a '?'      . I wanted to replace it but RapidMiner didnt recognize it as a charakter.

I used the Replace-Operator and wrote in "replace what"      a 'any character'- character from the regular expressions suggestions window. All other charaters were replaced but not the '?'.

Now i wanted to know if that is because of a bug or did i smth wrong ?


Greetings

User

Answers

  • choose_username
    choose_username New Altair Community Member
    I found the following out:

    if i change the ending of the file to  .arff then i cant filter the '?' out


    if i change the ending of the file to  .csv then the '?' gets filtered out.


    Maybe this helps.

    ______________________

    User.
  • cherokee
    cherokee New Altair Community Member
    Hi choose_username,

    RapidMiner uses '?' to denote a missing value. As there is no value given you cannot replace it (there is none). You can use either Replace Missing Values or Impute Missing Values, both from Data Transformation.Data Cleansing!

    Best regards,
    chero