🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"Text dictionary matching"

User: "sb"
New Altair Community Member
Updated by Jocelyn
The Filter (Dictionary) filters OUT words - is there a way to keep words matching those in a dictionary.  I can use FiilterTokens(byContent), but this needs a verrrrrry long list  of words as a regular expression.  Am looking for something akin to an 'Invert' choice in the Filter(Dictionary) operator.
Thanks.

Find more posts tagged with

Sort by:
1 - 2 of 21
    User: "colo"
    New Altair Community Member
    Hi,
    sb wrote:

    The Filter (Dictionary) filters OUT words
    that's why it is called "Filter Stopwords" ;)

    Since the "Filter Documents/Tokens" operators do not provide the ability to use a dictionary file, you could perhaps modiy these operators (or invert the bevavior of the stopword filter) if you are familiar with Java programming. If you don't want to look at the source code you might possible load a dictionary file and automatically build a regular expression from it (just concatenate the dictionary words separated by a vertical bar). But I don't know if there are some length limitations for macros that would prevent you from using this long expression as parameter for "Filter Documents (by Content)".
    Just a few thoughts...

    Regards
    Matthias
    User: "land"
    New Altair Community Member
    Hi,
    the good news are: No length limitation. The Bad news: Currently no possibility to invert it. But might be easier implementing a script that will build the "opposite" of tokens in a document given two documents than implementing the hole dictionary method again.

    Greetings,
      Sebastian