words containing UMLAUTE in Text Mining

Thesis_12
Thesis_12 New Altair Community Member
edited November 5 in Community Q&A
Dear all,

apparently Rapid Miner is not able to search for certain words containing German Umlaute such as ä,ö,ü or also ß. When I search for the word "Änderung" in "regular expression" (in "Filter Tokens by Region" /condition: "contains match") it doesn't show any results.
I use version 5.3.005 on a Mac and am working with HTML documents. I know that the problem described above does not occur with an older version and Windows.

However, I need to solve this problem with version 5.3.005 on a mac.

I tried with " .{1,2}nderung" which worked but also gave me results like "Minderung" which was not intended.

I would be very glad if somebody knew a solution for this problem.

Thanks a lot
Tagged:

Answers

  • MariusHelf
    MariusHelf New Altair Community Member
    How do you retrieve your data?
    For some data retrieval operators you have to configure the correct encoding. If your input data is e.g. encoded in UTF-8 you have to configure that in the respective operator.

    Best regards,
    Marius