language filter issue

New Altair Community Member
Updated by Jocelyn
I have a document that include both chinese and english. Can I filter all those english text and keep chinese text only? Or in the other direction, can I filter all those chinese text and keep english text only?
Find more posts tagged with
Sort by:
1 - 3 of
31
Btw, my team has released a RapidMiner extension to perform multilingual text analysis - the Rosette Text Toolkit. We have an "Identify Language" operator that returns the language of every cell in the input attribute (identifies 56 languages, including Chinese). The extension may help in analyzing multiple-language input - and most of our operators support Chinese.
-Lauren
Just saw this one. Yes you can, simply use a regular expression in your filter and search for \p{Han} this only selects Chinese characters.
To get the reverse just invert it.