Community & Support
Learn
Marketplace
Discussions
Categories
Discussions
General
Platform
Academic
Partner
Regional
Explore Siemens Communities
User Groups
Documentation
Events
Altair Exchange
Share or Download Projects
Resources
News & Instructions
Programs
YouTube
Employee Resources
This tab can be seen by employees only. Please do not share these resources externally.
Groups
Join a User Group
Support
Home
Discussions
Community Q&A
"Text dictionary matching"
sb
The Filter (Dictionary) filters OUT words - is there a way to keep words matching those in a dictionary. I can use FiilterTokens(byContent), but this needs a verrrrrry long list of words as a regular expression. Am looking for something akin to an 'Invert' choice in the Filter(Dictionary) operator.
Thanks.
Find more posts tagged with
AI Studio
Text Mining + NLP
Accepted answers
All comments
colo
Hi,
sb wrote:
The Filter (Dictionary) filters OUT words
that's why it is called "Filter Stopwords"
Since the "Filter Documents/Tokens" operators do not provide the ability to use a dictionary file, you could perhaps modiy these operators (or invert the bevavior of the stopword filter) if you are familiar with Java programming. If you don't want to look at the source code you might possible load a dictionary file and automatically build a regular expression from it (just concatenate the dictionary words separated by a vertical bar). But I don't know if there are some length limitations for macros that would prevent you from using this long expression as parameter for "Filter Documents (by Content)".
Just a few thoughts...
Regards
Matthias
land
Hi,
the good news are: No length limitation. The Bad news: Currently no possibility to invert it. But might be easier implementing a script that will build the "opposite" of tokens in a document given two documents than implementing the hole dictionary method again.
Greetings,
Sebastian
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups