Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
[SOLVED]The approach for filtering non-letter tokens
winecoding
In Rapidminer, I use tokenize operator to process a lot of documents. Currently, I have some documents that have a lot of no-letter characters, such as digits, %, $ or any other non-letter symbols. Are there any operators that can allow me to filter these tokens? Thanks.
Find more posts tagged with
AI Studio
Accepted answers
All comments
MariusHelf
Hi,
first of all, you have to configure the Tokenize operator to use a splitting pattern appropriate to your problem. By default, it splits at "non-letters", you could change it to e.g. split by all space characters.
Then, to filter, you can use the Filter Tokens operator with a customized pattern.
If you have probems with the regular expressions, please post again.
Happy Mining!
~Marius
winecoding
Marius, Thanks.
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups