List of words that are filtered with Stopwords, Stemming and Tokenizing?

Jonas97
Jonas97 New Altair Community Member
edited November 2024 in Community Q&A
Hello,

is there a function in Rapid Miner that I can use to create a list of words or the number of words, which the Process Steps Filter Stopwords, Stemming and Tokenizing has identiefied and excluded from the analyse of the Textcorpus?

Thank you in advance!

Jonas
Tagged:

Answers

  • Telcontar120
    Telcontar120 New Altair Community Member
    I am not sure if there is a direct way to view this, but you could accomplish this if you first run your document through and just tokenize, then run it through a 2nd time and tokenize as well as the other text processing options you want (stopwords, stemming, etc.) and then take both resulting wordlist datasets and use Set Minus (join type) to get the non-matches.